SFTvsRL Models & Data Collection This collection contains 4 initial checkpoints for https://github.com/LeslieTrue/SFTvsRL and necessary data for V-IRL training. • 5 items • Updated 7 days ago • 7
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published 4 days ago • 35
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment Paper • 2502.03512 • Published 6 days ago • 5
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published 4 days ago • 40
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published 8 days ago • 9
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published Nov 9, 2024 • 20
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227
Llama 3.3 (All Versions) Collection Meta's new Llama 3.3 (70B) model in all formats. Includes GGUF, 4-bit bnb and original versions. • 3 items • Updated 7 days ago • 35
OpenAssistant Conversations -- Democratizing Large Language Model Alignment Paper • 2304.07327 • Published Apr 14, 2023 • 6
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 511
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 8 days ago • 23
RandLoRA: Full-rank parameter-efficient fine-tuning of large models Paper • 2502.00987 • Published 9 days ago • 9
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 8 days ago • 34