CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on 2.5 billion data points of CommonCrawl at resolution 224x224. It was introduced in the paper Learning Transferable Visual Models From Natural Language Supervision and further reproduced in the follow-up paper Demystifying CLIP Data. The weights were converted from the l14_fullcc2.5b.pt file presented in the original repository.

Downloads last month: 81

Safetensors

Model size

428M params

Tensor type

I64

F32

Inference Providers NEW

Zero-Shot Image Classification

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including cs-giung/clip-vit-large-patch14-fullcc2.5b

MetaCLIP (CommonCrawl-2.5B)

Collection

5 items • Updated Jul 7, 2024