Submitted by s-emanuilov 30 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents · 6 authors 2
Submitted by hzxie 20 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities · 4 authors 2
Submitted by akhaliq 15 RepVideo: Rethinking Cross-Layer Representation for Video Generation · 6 authors 3
Submitted by s-emanuilov 12 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion · 7 authors 2
Submitted by akhaliq 10 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework · 5 authors 2
Submitted by wzk1015 7 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding · 11 authors 2
Submitted by iliashum 6 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography · 7 authors 2
Submitted by nielsr 2 Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding · 6 authors 2