Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published 29 days ago • 16
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published Jan 10 • 31
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published Dec 19, 2024 • 26
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published Dec 10, 2024 • 11
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark Paper • 2412.07825 • Published Dec 10, 2024 • 11
ViTamin: Designing Scalable Vision Models in the Vision-Language Era Paper • 2404.02132 • Published Apr 2, 2024 • 2