view article Article From Llasa to Llasagna π: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other β’ about 9 hours ago β’ 15
On Teacher Hacking in Language Model Distillation Paper β’ 2502.02671 β’ Published 7 days ago β’ 14
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published 7 days ago β’ 153
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper β’ 2501.18965 β’ Published 11 days ago β’ 6
view article Article Mini-R1: Reproduce Deepseek R1 βaha momentβ a RL tutorial By open-r1 β’ 11 days ago β’ 33
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ 19 days ago β’ 62
view article Article How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents By Steveeeeeeen β’ 13 days ago β’ 16
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Paper β’ 2501.14334 β’ Published 18 days ago β’ 17
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published Jan 10 β’ 43
view article Article Yay! Organizations can now publish blog Articles By huggingface and 3 others β’ 22 days ago β’ 33
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era By MiniMax-AI β’ 27 days ago β’ 40
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M β’ 16 items β’ Updated 5 days ago β’ 227
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 255
DolphinLabeled Datasets Collection Eric Hartford has added labels to help you filter datasets, for your pleasure. β’ 5 items β’ Updated Jan 6 β’ 12