zhanghang's picture

zhanghang

hangzhang-nlp

·

hangzhang-nlp

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

liked a Space 4 days ago

lixin4ever/VideoLLaMA3

upvoted a paper 20 days ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

View all activity

Organizations

hangzhang-nlp's activity

upvoted a paper 4 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 5 days ago • 20

liked a Space 4 days ago

VideoLLaMA3

Frontier Foundation Models for Video Understanding

upvoted 2 papers 20 days ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published 21 days ago • 55

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 20 days ago • 79

upvoted 2 papers 21 days ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 21 days ago • 81

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published 22 days ago • 63

upvoted 2 papers about 1 month ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

upvoted a paper about 2 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345

liked a dataset 4 months ago

BAAI/Infinity-Instruct

Viewer • Updated 27 days ago • 20.4M • 5.4k • 589

upvoted 2 papers 4 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31

liked a Space 6 months ago

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

upvoted a paper 7 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 56

liked a model 7 months ago

openvla/openvla-7b

Image-Text-to-Text • Updated Sep 16, 2024 • 79k • 93

liked a Space 7 months ago

VideoLLaMA2

Media understanding

reacted to stas's post with 👍 9 months ago

Post

If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.

The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378

and you need deepspeed>=0.13.0

Thanks to Masahiro Tanaka for the fix.

New activity in HuggingFaceM4/the_cauldron 10 months ago

Where is the GSD dataset?

#6 opened 10 months ago by

upvoted an article 10 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15, 2024

• 174

upvoted a collection 10 months ago

WizardLM

0 items • Updated Jan 8 • 105