math-extraction-comp/HuggingFaceH4__R2-Q7B-GR1-ALL-s1k-5e-5-weight-decay-1e-4_private Viewer • Updated 1 day ago • 30 • 2
math-extraction-comp/HuggingFaceH4__R2-Q7B-GR1-ALL-s1k-5e-5-weight-decay-1e-4_private Viewer • Updated 1 day ago • 30 • 2
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
math-extraction-comp/deepseek-ai__DeepSeek-R1-Distill-Qwen-32B_private Viewer • Updated 5 days ago • 500 • 22
math-extraction-comp/deepseek-ai__DeepSeek-R1-Distill-Qwen-32B_private Viewer • Updated 5 days ago • 500 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
debug-r1/details_deepseek-ai__DeepSeek-R1-Distill-Qwen-32B_private Viewer • Updated 6 days ago • 2k • 29
debug-r1/details_deepseek-ai__DeepSeek-R1-Distill-Qwen-32B_private Viewer • Updated 6 days ago • 2k • 29
debug-r1/details_deepseek-ai__DeepSeek-R1-Distill-Qwen-32B_private Viewer • Updated 6 days ago • 2k • 29