BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18, 2024 • 43
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published 8 days ago • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published 8 days ago • 9
view post Post 2937 🚀 Introducing @huggingface Open Deep-Research💥In just 24 hours, we built an open-source agent that:✅ Autonomously browse the web✅ Search, scroll & extract info✅ Download & manipulate files✅ Run calculations on data55% on GAIA validation set! Help us improve it!💡https://huggingface.co/blog/open-deep-research See translation 3 replies · 🤗 7 7 🔥 3 3 🚀 2 2 + Reply
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas Paper • 2501.15427 • Published 16 days ago • 6
view post Post 2052 Discover all the improvements in the new version of Lighteval: https://huggingface.co/docs/lighteval/ See translation 👀 4 4 🔥 1 1 + Reply
What's the Meaning of Superhuman Performance in Today's NLU? Paper • 2305.08414 • Published May 15, 2023 • 1
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS Paper • 2411.19655 • Published Nov 29, 2024 • 20