BigCode

Enterprise

non-profit

https://www.bigcode-project.org/

bigcode-project

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

terryyz updated a dataset 4 days ago

bigcode/bigcodebench-hard-results

terryyz updated a dataset 4 days ago

bigcode/bigcodebench-hard-solve-rate

terryyz updated a dataset 4 days ago

bigcode/bigcodebench-hard-domain

View all activity

Articles

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

bigcode's activity

terryyz

updated 8 datasets 4 days ago

bigcode/bigcodebench-hard-results

Viewer • Updated 4 days ago • 180 • 126 • 1

bigcode/bigcodebench-hard-solve-rate

Viewer • Updated 4 days ago • 296 • 156 • 1

bigcode/bigcodebench-hard-domain

Viewer • Updated 4 days ago • 331 • 117 • 1

bigcode/bigcodebench-hard-perf

Viewer • Updated 4 days ago • 331 • 102

bigcode/bigcodebench-results

Viewer • Updated 4 days ago • 180 • 147 • 2

bigcode/bigcodebench-solve-rate

Viewer • Updated 4 days ago • 2.28k • 135

bigcode/bigcodebench-domain

Viewer • Updated 4 days ago • 269 • 96

bigcode/bigcodebench-perf

Viewer • Updated 4 days ago • 269 • 83

loubnabnl

authored a paper 5 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 7 days ago • 153

clefourrier

authored a paper 5 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 7 days ago • 153

anton-l

authored a paper 5 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 7 days ago • 153

arjunguha

authored a paper 7 days ago

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published 8 days ago • 9

canders1

authored a paper 7 days ago

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published 8 days ago • 9

albertvillanova

posted an update 7 days ago

Post

2937

🚀 Introducing @huggingface Open Deep-Research💥

In just 24 hours, we built an open-source agent that:
✅ Autonomously browse the web
✅ Search, scroll & extract info
✅ Download & manipulate files
✅ Run calculations on data

55% on GAIA validation set! Help us improve it!💡
https://huggingface.co/blog/open-deep-research

3 replies

·

terryyz

updated a Space 7 days ago

BigCodeBench Leaderboard

Explore and analyze code evaluation data

Muennighoff

in bigcode/commitpack 7 days ago

fix: c# and f# config

#3 opened 7 days ago by

wyu1

authored a paper 14 days ago

OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

Paper • 2501.15427 • Published 16 days ago • 6

albertvillanova

posted an update about 1 month ago

Post

2052

Discover all the improvements in the new version of Lighteval: https://huggingface.co/docs/lighteval/

sted97

authored 2 papers about 2 months ago

What's the Meaning of Superhuman Performance in Today's NLU?

Paper • 2305.08414 • Published May 15, 2023 • 1

Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS

Paper • 2411.19655 • Published Nov 29, 2024 • 20