Submitted by jt-zhang 52 SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration · 6 authors 9
Submitted by Ziqi 30 VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models · 17 authors 3
Submitted by teowu 18 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation · 6 authors 5
Submitted by wchai 18 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory · 5 authors 3
Submitted by haonan3 15 When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training · 7 authors 2
Submitted by akhaliq 13 Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents · 10 authors 2
Submitted by CiaraRowles 12 Stylecodes: Encoding Stylistic Information For Image Generation · 1 authors 2
Submitted by amanchadha 8 ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models · 12 authors 4
Submitted by davidbrandfonbrener 5 Loss-to-Loss Prediction: Scaling Laws for All Datasets · 5 authors 2
Submitted by a-fontanella 3 Generating Compositional Scenes via Text-to-image RGBA Instance Generation · 5 authors 2
Submitted by Kaichengalex 2 ORID: Organ-Regional Information Driven Framework for Radiology Report Generation · 6 authors 2