CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on LAION-2B at resolution 224x224. It was introduced in the paper Learning Transferable Visual Models From Natural Language Supervision and further reproduced in the follow-up paper Reproducible scaling laws for contrastive language-image learning. The weights were converted from the laion/CLIP-ViT-L-14-laion2B-s32B-b82K presented in the OpenCLIP LAION-2B collections.

Downloads last month
8
Safetensors
Model size
428M params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Collection including cs-giung/clip-vit-large-patch14-laion2b