prometheus-eval

university

AI & ML interests

None defined yet.

Recent Activity

seungone updated a dataset 15 days ago

prometheus-eval/outcome_meta_evaluation

jinheon authored a paper 29 days ago

VideoRAG: Retrieval-Augmented Generation over Video Corpus

seungone authored a paper about 1 month ago

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

View all activity

Organization Card

Community About org cards

We train language models specialized in evaluating other language models and optimize evaluation pipelines!

spaces 1

BiGGen Bench Leaderboard

Explore model performance with interactive leaderboards

models 8

prometheus-eval/prometheus-8x7b-v2.0

Text2Text Generation • Updated Nov 29, 2024 • 3.1k • 49

prometheus-eval/prometheus-7b-v2.0

Text2Text Generation • Updated Nov 29, 2024 • 51.1k • 87

prometheus-eval/prometheus-7b-v2.0-GGUF

Text2Text Generation • Updated Jul 12, 2024 • 9 • 4

prometheus-eval/prometheus-bgb-8x7b-v2.0

Text Generation • Updated Apr 11, 2024 • 664 • 5

prometheus-eval/prometheus-vision-13b-v1.0

Image-to-Text • Updated Jan 15, 2024 • 80 • 12

prometheus-eval/prometheus-vision-7b-v1.0

Image-to-Text • Updated Jan 15, 2024 • 30 • 9

prometheus-eval/prometheus-13b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 2.49k • 134

prometheus-eval/prometheus-7b-v1.0

Text2Text Generation • Updated Oct 14, 2023 • 119 • 30

datasets 12

prometheus-eval/outcome_meta_evaluation

Viewer • Updated 15 days ago • 18.1k • 163 • 2

prometheus-eval/outcome_meta_evaluation_heuristic

Viewer • Updated 24 days ago • 18.1k • 50

prometheus-eval/MMQA

Viewer • Updated Nov 18, 2024 • 330 • 51 • 3

prometheus-eval/MM-Eval

Viewer • Updated Oct 26, 2024 • 11.1k • 95 • 5

prometheus-eval/BiGGen-Bench

Viewer • Updated Oct 16, 2024 • 765 • 263 • 12

prometheus-eval/BiGGen-Bench-Results

Viewer • Updated Aug 12, 2024 • 76.6k • 284 • 7

prometheus-eval/Preference-Collection

Viewer • Updated May 3, 2024 • 200k • 221 • 33

prometheus-eval/Preference-Bench

Viewer • Updated Apr 6, 2024 • 2k • 80 • 2

prometheus-eval/Feedback-Bench

Viewer • Updated Apr 6, 2024 • 1k • 150 • 4

prometheus-eval/Perception-Bench

Viewer • Updated Jan 15, 2024 • 500 • 156 • 4