Introduction

This repository contains the 6B model of the paper InternVideo2 in stage 2.

Code: https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality

πŸš€ Installation

Please refer to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md

Usage

import cv2
from transformers import AutoModel
from modeling_internvideo2 import (retrieve_text, vid2tensor, _frame_from_video,)


if __name__ == '__main__':
    model = AutoModel.from_pretrained("OpenGVLab/InternVideo2-Stage2_6B", trust_remote_code=True).eval()

    video = cv2.VideoCapture('example1.mp4')
    frames = [x for x in _frame_from_video(video)]
    text_candidates = ["A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.",
                    "A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.",
                    "A person dressed in a blue jacket shovels the snow-covered pavement outside their house.",
                    "A cat excitedly runs through the yard, chasing a rabbit.",
                    "A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery."]

    texts, probs = retrieve_text(frames, text_candidates, model=model, topk=5)
    for t, p in zip(texts, probs):
        print(f'text: {t} ~ prob: {p:.4f}')

    vidtensor = vid2tensor('example1.mp4', fnum=4)
    feat = model.get_vid_feat(vidtensor)
Downloads last month
13
Safetensors
Model size
6.37B params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including OpenGVLab/InternVideo2-Stage2_6B