stabilityai/stable-video-diffusion-img2vid-xt Image-to-Video • Updated Jul 10, 2024 • 389k • 2.86k
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15 • 2
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25, 2024 • 4
Running 541 541 Vision Arena (Testing VLMs side-by-side) 🖼 Analyze images to detect and label objects
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? Paper • 2406.07546 • Published Jun 11, 2024 • 8