Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.02397

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 16
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 9
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 71
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

Adaptive Caching for Faster Video Generation with Diffusion Transformers

Paper • 2411.02397 • Published Nov 4, 2024 • 23

Direct Judgement Preference Optimization

Paper • 2409.14664 • Published Sep 23, 2024
Adaptive Caching for Faster Video Generation with Diffusion Transformers

Paper • 2411.02397 • Published Nov 4, 2024 • 23
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models

Paper • 2410.08876 • Published Oct 11, 2024

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 22
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 28
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 36

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 19
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 16
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 27
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper • 2401.09047 • Published Jan 17, 2024 • 14

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

FreeU: Free Lunch in Diffusion U-Net

Paper • 2309.11497 • Published Sep 20, 2023 • 65
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Paper • 2311.12092 • Published Nov 20, 2023 • 22
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Paper • 2311.13600 • Published Nov 22, 2023 • 43
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11, 2024 • 49

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs