Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 1 day ago • 35
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Paper • 2502.06782 • Published 1 day ago • 7
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published 2 days ago • 20
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents Paper • 2502.05957 • Published 2 days ago • 6
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Paper • 2502.05431 • Published 4 days ago • 5
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 1 day ago • 10
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 1 day ago • 59
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published 4 days ago • 57
Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression Paper • 2502.04296 • Published 5 days ago • 6
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published 6 days ago • 8
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Paper • 2502.04299 • Published 5 days ago • 14
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published 5 days ago • 19
UltraIF: Advancing Instruction Following from the Wild Paper • 2502.04153 • Published 5 days ago • 20
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published 4 days ago • 35
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 4 days ago • 41