vision / README.md
Apoorva96nbd's picture
added initial files
01259cd verified
## Inhabitr Design Engine 1.0
#### PaliGemma Vision Model(unifying attributes model)
Leveraging the PaliGemma vision model to unify attribute extraction, this custom trained Model identifies and outputs furniture attributes—such as primary color and secondary color, primary material, secondary material, and design style—in JSON format.
# Features
Paligemma Model: A 3-billion parameter multi-model capable of generating captions from images, performing object detection, object segmentation ,OCR and VQA.
Custom Training: The model is custom trained on approximately 500 pairs of cropped images (sofa, accent chair) along with their captions.
# Requirements
Python 3.9+
Minimum 12GB of GPU for local inference
Pip (Python package installer)
# Installation
First, clone the repository:
git clone git clone https://Apoorva_inhabitr@bitbucket.org/Inhabitr/vision_models.git
# Create and Activate Python Virtual Environment
For Unix/macOS
python3 -m venv env
source env/bin/activate
For Windows
python -m venv env
.\env\Scripts\activate
# Install the required Python libraries
pip install -r requirements.txt
# Download Model Weights and Tokenizer
Download the model weights and tokenizer from Google Drive
https://drive.google.com/drive/folders/1S-z374V-yd3izeBatAMQlitmZ0mbMv6s?usp=drive_link.
Place the downloaded files in the models/ folder located in the root directory of the project.
# Start the API
python main.py
If running locally, the API will be available at http://localhost:5000.
# API Endpoints for Captions
predict attributes
POST /predict
Request Parameters
Send an image file
image: The image file used as the basis for the search.
Example Request with Image File
curl -X POST http://127.0.0.1:5000/predict \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/image.jpg"
# Performance Considerations
GPU: Model parameters are loaded into memory for fast inference within 1 sec after initial loading.
CPU: Initial model loading may take 4-6 minutes based on your system configuration.