Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper β’ 2502.04328 β’ Published 5 days ago β’ 20
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper β’ 2501.12895 β’ Published 21 days ago β’ 55
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper β’ 2501.13106 β’ Published 20 days ago β’ 79
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper β’ 2501.12380 β’ Published 21 days ago β’ 81
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper β’ 2501.11873 β’ Published 22 days ago β’ 63
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper β’ 2501.00599 β’ Published Dec 31, 2024 β’ 41
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper β’ 2501.00958 β’ Published Jan 1 β’ 99
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper β’ 2410.17243 β’ Published Oct 22, 2024 β’ 89
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper β’ 2410.12787 β’ Published Oct 16, 2024 β’ 31
Running on CPU Upgrade 12.4k 12.4k Open LLM Leaderboard π Track, rank and evaluate open LLMs and chatbots
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper β’ 2407.19672 β’ Published Jul 29, 2024 β’ 56
view post Post If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378and you need deepspeed>=0.13.0Thanks to Masahiro Tanaka for the fix. π 7 7 + Reply
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 β’ 174