vision / README.md
Apoorva96nbd's picture
added initial files
01259cd verified

Inhabitr Design Engine 1.0

PaliGemma Vision Model(unifying attributes model)

Leveraging the PaliGemma vision model to unify attribute extraction, this custom trained Model identifies and outputs furniture attributes—such as primary color and secondary color, primary material, secondary material, and design style—in JSON format.

Features

Paligemma Model: A 3-billion parameter multi-model capable of generating captions from images, performing object detection, object segmentation ,OCR and VQA. Custom Training: The model is custom trained on approximately 500 pairs of cropped images (sofa, accent chair) along with their captions.

Requirements

Python 3.9+

Minimum 12GB of GPU for local inference

Pip (Python package installer)

Installation

First, clone the repository:

 git clone git clone https://Apoorva_inhabitr@bitbucket.org/Inhabitr/vision_models.git

Create and Activate Python Virtual Environment

For Unix/macOS

 python3 -m venv env
 source env/bin/activate

For Windows

 python -m venv env
 .\env\Scripts\activate

Install the required Python libraries

pip install -r requirements.txt

Download Model Weights and Tokenizer

Download the model weights and tokenizer from Google Drive

https://drive.google.com/drive/folders/1S-z374V-yd3izeBatAMQlitmZ0mbMv6s?usp=drive_link.

Place the downloaded files in the models/ folder located in the root directory of the project.

Start the API

 python main.py

If running locally, the API will be available at http://localhost:5000.

API Endpoints for Captions

predict attributes

POST /predict

Request Parameters

Send an image file

image: The image file used as the basis for the search.

Example Request with Image File

  curl -X POST http://127.0.0.1:5000/predict \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/your/image.jpg"

Performance Considerations

GPU: Model parameters are loaded into memory for fast inference within 1 sec after initial loading.

CPU: Initial model loading may take 4-6 minutes based on your system configuration.