Spaces:
Configuration error
Configuration error
## Inhabitr Design Engine 1.0 | |
#### PaliGemma Vision Model(unifying attributes model) | |
Leveraging the PaliGemma vision model to unify attribute extraction, this custom trained Model identifies and outputs furniture attributes—such as primary color and secondary color, primary material, secondary material, and design style—in JSON format. | |
# Features | |
Paligemma Model: A 3-billion parameter multi-model capable of generating captions from images, performing object detection, object segmentation ,OCR and VQA. | |
Custom Training: The model is custom trained on approximately 500 pairs of cropped images (sofa, accent chair) along with their captions. | |
# Requirements | |
Python 3.9+ | |
Minimum 12GB of GPU for local inference | |
Pip (Python package installer) | |
# Installation | |
First, clone the repository: | |
git clone git clone https://Apoorva_inhabitr@bitbucket.org/Inhabitr/vision_models.git | |
# Create and Activate Python Virtual Environment | |
For Unix/macOS | |
python3 -m venv env | |
source env/bin/activate | |
For Windows | |
python -m venv env | |
.\env\Scripts\activate | |
# Install the required Python libraries | |
pip install -r requirements.txt | |
# Download Model Weights and Tokenizer | |
Download the model weights and tokenizer from Google Drive | |
https://drive.google.com/drive/folders/1S-z374V-yd3izeBatAMQlitmZ0mbMv6s?usp=drive_link. | |
Place the downloaded files in the models/ folder located in the root directory of the project. | |
# Start the API | |
python main.py | |
If running locally, the API will be available at http://localhost:5000. | |
# API Endpoints for Captions | |
predict attributes | |
POST /predict | |
Request Parameters | |
Send an image file | |
image: The image file used as the basis for the search. | |
Example Request with Image File | |
curl -X POST http://127.0.0.1:5000/predict \ | |
-H "Content-Type: multipart/form-data" \ | |
-F "file=@/path/to/your/image.jpg" | |
# Performance Considerations | |
GPU: Model parameters are loaded into memory for fast inference within 1 sec after initial loading. | |
CPU: Initial model loading may take 4-6 minutes based on your system configuration. | |