Spaces:

Apoorva96nbd
/

vision

Configuration error

App Files Files Community

vision / README.md

Apoorva96nbd

added initial files

01259cd verified 6 months ago

preview code

raw

history blame contribute delete

2.18 kB

	## Inhabitr Design Engine 1.0

	#### PaliGemma Vision Model(unifying attributes model)
	Leveraging the PaliGemma vision model to unify attribute extraction, this custom trained Model identifies and outputs furniture attributes—such as primary color and secondary color, primary material, secondary material, and design style—in JSON format.

	# Features
	Paligemma Model: A 3-billion parameter multi-model capable of generating captions from images, performing object detection, object segmentation ,OCR and VQA.
	Custom Training: The model is custom trained on approximately 500 pairs of cropped images (sofa, accent chair) along with their captions.

	# Requirements
	Python 3.9+

	Minimum 12GB of GPU for local inference

	Pip (Python package installer)

	# Installation
	First, clone the repository:

	git clone git clone https://Apoorva_inhabitr@bitbucket.org/Inhabitr/vision_models.git

	# Create and Activate Python Virtual Environment

	For Unix/macOS

	python3 -m venv env
	source env/bin/activate

	For Windows

	python -m venv env
	.\env\Scripts\activate

	# Install the required Python libraries

	pip install -r requirements.txt

	# Download Model Weights and Tokenizer

	Download the model weights and tokenizer from Google Drive

	https://drive.google.com/drive/folders/1S-z374V-yd3izeBatAMQlitmZ0mbMv6s?usp=drive_link.

	Place the downloaded files in the models/ folder located in the root directory of the project.

	# Start the API
	python main.py
	If running locally, the API will be available at http://localhost:5000.

	# API Endpoints for Captions
	predict attributes

	POST /predict

	Request Parameters

	Send an image file

	image: The image file used as the basis for the search.

	Example Request with Image File

	curl -X POST http://127.0.0.1:5000/predict \
	-H "Content-Type: multipart/form-data" \
	-F "file=@/path/to/your/image.jpg"

	# Performance Considerations
	GPU: Model parameters are loaded into memory for fast inference within 1 sec after initial loading.

	CPU: Initial model loading may take 4-6 minutes based on your system configuration.