Hugging Face

Enterprise

company

Verified

https://huggingface.co

huggingface

Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

pagezyhf updated a dataset about 1 hour ago

huggingface/documentation-images

julien-c new activity about 18 hours ago

huggingface/HuggingDiscussions:[FEEDBACK] Local apps

giadap authored a paper about 19 hours ago

Fully Autonomous AI Agents Should Not be Developed

View all activity

Articles

Yay! Organizations can now publish blog Articles

22 days ago

• 33

huggingface's activity

pagezyhf

updated a dataset about 1 hour ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 50 • 3.87M • 47

burtenshaw

posted an update about 3 hours ago

Post

488

The Hugging Face agents course is finally out!

👉 https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt

lewtun

posted an update about 17 hours ago

Post

1502

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

julien-c

in huggingface/HuggingDiscussions about 18 hours ago

[FEEDBACK] Local apps

#31 opened 8 months ago by

kramp

davidberenstein1957

posted an update about 18 hours ago

Post

585

Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Blog: https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data

giadap

authored a paper about 19 hours ago

Fully Autonomous AI Agents Should Not be Developed

Paper • 2502.02649 • Published 7 days ago • 20

SaylorTwift

updated a dataset about 19 hours ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 50 • 3.87M • 47

medmekk

updated a dataset about 22 hours ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 50 • 3.87M • 47

medmekk

in huggingface/documentation-images about 23 hours ago

add_image_doc_fp8

#431 opened about 23 hours ago by

medmekk

lysandre

updated a dataset about 23 hours ago

huggingface/transformers-metadata

Viewer • Updated about 23 hours ago • 1.56k • 636 • 16

Xenova

posted an update 4 days ago

Post

4924

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

7 replies

burtenshaw

posted an update 4 days ago

Post

2979

SmolLM2 paper is out! 😊

😍 Why do I love it? Because it facilitates teaching and learning!

Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.

- People have inferred, fine-tuned, aligned, and evaluated this smol model.
- People used they're own machines and they've used free tools like colab, kaggle, and spaces.
- People tackled use cases in their job, for fun, in their own language, and with their friends.

upvote the paper SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)