MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (Converted to CoreML)

MobileCLIP was introduced in MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training (CVPR 2024), by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel.

This repository contains MobileCLIP-S0 split to Text, Image encoders which where converted to CoreML using this Notebook

Usage exmaple

For an example of how to use these converted models in Swift, please reffer to Queryable-MC