Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 5 days ago • 20
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 5 days ago • 20
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 5 days ago • 20 • 2
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 6 items • Updated Dec 11, 2024 • 16
Oryx-1.5 Collection Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution • 4 items • Updated 28 days ago • 5
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published Nov 4, 2024 • 24
view post Post 3316 🔥🔥🔥Introducing Oryx-1.5!A series of unified MLLMs with much stronger performance on all the image, video, and 3D benchmarks 😍🛠️Github: https://github.com/Oryx-mllm/Oryx🚀Model: THUdyh/oryx-15-6718c60763845525c2bba71d🎨Demo: THUdyh/Oryx👋Try the top-tier MLLM yourself!👀Stay tuned for more explorations on MLLMs! 🔥 12 12 + Reply
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023