File size: 2,175 Bytes
01259cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
## Inhabitr Design Engine 1.0

#### PaliGemma Vision Model(unifying attributes model)
Leveraging the PaliGemma vision model to unify attribute extraction, this custom trained Model identifies and outputs furniture attributes—such as primary color and secondary color, primary material, secondary material, and design style—in JSON format.

# Features
Paligemma Model: A 3-billion parameter multi-model capable of generating captions from images, performing object detection, object segmentation ,OCR and VQA.
Custom Training: The model is custom trained on approximately 500 pairs of cropped images (sofa, accent chair) along with their captions.

# Requirements
Python 3.9+

Minimum  12GB of GPU  for local inference

Pip (Python package installer)

# Installation
First, clone the repository:

     git clone git clone https://Apoorva_inhabitr@bitbucket.org/Inhabitr/vision_models.git


# Create and Activate Python Virtual Environment

For Unix/macOS

     python3 -m venv env

     source env/bin/activate


For Windows

     python -m venv env

     .\env\Scripts\activate


# Install the required Python libraries

    pip install -r requirements.txt


# Download Model Weights and Tokenizer

Download the model weights and tokenizer from Google Drive 

https://drive.google.com/drive/folders/1S-z374V-yd3izeBatAMQlitmZ0mbMv6s?usp=drive_link.



Place the downloaded files in the models/ folder located in the root directory of the project.



# Start the API

     python main.py

If running locally, the API will be available at http://localhost:5000.



# API Endpoints for Captions

predict attributes



POST /predict



Request Parameters



Send  an image  file 



image: The image file used as the basis for the search.



Example Request with Image File 



      curl -X POST http://127.0.0.1:5000/predict \

      -H "Content-Type: multipart/form-data" \

      -F "file=@/path/to/your/image.jpg"



# Performance Considerations

GPU: Model parameters are loaded into memory for fast inference within 1 sec after initial loading.



CPU: Initial model loading may take 4-6 minutes based on your system configuration.