Submitted by ahmedheakl 60 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs · 15 authors 5
Submitted by pmj110119 53 OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints · 6 authors 3
Submitted by myownskyW7 39 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? · 15 authors 2
Submitted by carboncoo 28 Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models · 11 authors 2
Submitted by akhaliq 20 Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains · 6 authors 2
Submitted by BestWishYsh 15 ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning · 9 authors 2
Submitted by Fiaa 15 ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding · 9 authors 2
Submitted by BestWishYsh 14 Multi-subject Open-set Personalization in Video Generation · 10 authors 2