liuzuyan's picture

1 12 8

liuzuyan

Zuyan

·

liuzuyan

AI & ML interests

None yet

Recent Activity

authored a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

upvoted a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

commented on a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

View all activity

Organizations

None yet

Zuyan's activity

authored a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 5 days ago • 20

upvoted a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 5 days ago • 20

commented a paper 5 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 5 days ago • 20 •

liked a model 7 days ago

THUdyh/Ola-7b

Any-to-Any • Updated 1 day ago • 156 • 10

upvoted 2 collections 12 days ago

Oryx

Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 6 items • Updated Dec 11, 2024 • 16

Oryx-1.5

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution • 4 items • Updated 28 days ago • 5

upvoted a paper about 2 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

upvoted a paper 2 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 94

authored a paper 3 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23

upvoted 2 papers 3 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 23

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Paper • 2411.02265 • Published Nov 4, 2024 • 24

reacted to THUdyh's post with 🔥 4 months ago

Post

3316

🔥🔥🔥Introducing Oryx-1.5!
A series of unified MLLMs with much stronger performance on all the image, video, and 3D benchmarks 😍
🛠️Github: https://github.com/Oryx-mllm/Oryx
🚀Model: THUdyh/oryx-15-6718c60763845525c2bba71d
🎨Demo: THUdyh/Oryx
👋Try the top-tier MLLM yourself!

👀Stay tuned for more explorations on MLLMs!

upvoted a paper 5 months ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 106

liked a Space 5 months ago

Oryx

Generate detailed descriptions from images and videos

liked 5 models 5 months ago

THUdyh/Oryx-ViT

Image Classification • Updated Sep 23, 2024 • 5

THUdyh/Oryx-34B-Image

Text Generation • Updated Sep 23, 2024 • 10 • 2

THUdyh/Oryx-7B-Image

Text Generation • Updated Sep 23, 2024 • 1.01k • 3

THUdyh/Oryx-34B

Text Generation • Updated Sep 23, 2024 • 32 • 3

THUdyh/Oryx-7B

Text Generation • Updated Sep 25, 2024 • 243 • 11

authored a paper 5 months ago

Unleashing Text-to-Image Diffusion Models for Visual Perception

Paper • 2303.02153 • Published Mar 3, 2023