NYU VisionX

university

https://www.sainingxie.com/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sainx authored a paper 13 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

jihanyang authored a paper 13 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

craigwu authored a paper 18 days ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

View all activity

nyu-visionx's activity

sayakpaul

posted an update 12 days ago

Post

1845

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

sainx

authored a paper 13 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 14 days ago • 101

jihanyang

authored a paper 13 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 14 days ago • 101

sayakpaul

posted an update 15 days ago

Post

1912

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

craigwu

authored a paper 18 days ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 19 days ago • 23

sainx

authored a paper 25 days ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published 26 days ago • 67

jihanyang

updated a dataset 28 days ago

nyu-visionx/VSI-Bench

Viewer • Updated 28 days ago • 5.13k • 2.31k • 30

sayakpaul

posted an update about 2 months ago

Post

4334

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

anjaliwgupta

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

jihanyang

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

rilynhan

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

sainx

authored a paper about 2 months ago

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

ShushengYang

authored 5 papers about 2 months ago

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

Paper • 2204.02964 • Published Apr 6, 2022

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 24

ShushengYang

updated a dataset about 2 months ago

nyu-visionx/VSI-Bench

Viewer • Updated 28 days ago • 5.13k • 2.31k • 30

sayakpaul

posted an update about 2 months ago

Post

2157

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

rilynhan

updated a dataset about 2 months ago

nyu-visionx/VSI-Bench

Viewer • Updated 28 days ago • 5.13k • 2.31k • 30

AI & ML interests

Recent Activity

Team members 15

nyu-visionx's activity