MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (Converted to CoreML)

MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

This repository contains MobileCLIP-S0 split to Text, Image encoders which where converted to CoreML using this Notebook

Usage exmaple

For an example of how to use these converted models in Swift, please reffer to Queryable-MC

Downloads last month
1
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.