view article Article Darija Chatbot Arena: Making LLMs Compete in the Moroccan Dialect By atlasia and 2 others • 1 day ago • 8
Leaderboards for Arabic Collection A collection for all leaderboards related to the Arabic Language. • 4 items • Updated 1 day ago • 1
view article Article Arabic RAG Leaderboard: A Comprehensive Framework for Evaluating Arabic Language Retrieval Systems By Navid-AI and 1 other • 2 days ago • 9
view article Article TerjamaBench: A Cultural Benchmark for English-Darija Machine Translation By imomayiz and 4 others • Jan 10 • 27
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 27 days ago • 65
view article Article Train 400x faster Static Embedding Models with Sentence Transformers 28 days ago • 142
view article Article CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard Jan 9 • 18
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards Paper • 2402.01781 • Published Feb 1, 2024 • 2
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 89
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 48
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation Paper • 2412.15255 • Published Dec 15, 2024 • 3
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated Jan 8 • 80