One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Zechen Bai ¹ Tong He ² Haiyang Mei ¹ Pichao Wang ² Ziteng Gao ¹ Joya Chen ¹ Lei Liu ² Zheng Zhang ² Mike Zheng Shou ¹

NeurIPS 2024

¹ Show Lab, National University of Singapore ² Amazon

Please find the code at: https://github.com/showlab/VideoLISA

Downloads last month: 367

Safetensors

Model size

4.48B params

Tensor type

F32

BF16

Inference Providers NEW

Image Segmentation

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for ZechenBai/VideoLISA-3.8B

Base model

MBZUAI/LLaVA-Phi-3-mini-4k-instruct

Finetuned

(1)

this model