CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on LAION-2B at resolution 224x224. It was introduced in the paper Learning Transferable Visual Models From Natural Language Supervision and further reproduced in the follow-up paper Reproducible scaling laws for contrastive language-image learning. The weights were converted from the laion/CLIP-ViT-L-14-laion2B-s32B-b82K presented in the OpenCLIP LAION-2B collections.

Safetensors

Model size

428M params

Tensor type

I64

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Collection including cs-giung/clip-vit-large-patch14-laion2b