Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,8 @@ datasets:
|
|
8 |
|
9 |
# Model Card for SpaceLLaVA
|
10 |
|
11 |
-
**SpaceLlama3.1** uses [llama3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) as the llm backbone along with the fused DINOv2+SigLIP features of [prismatic-vlms](https://github.com/TRI-ML/prismatic-vlms)
|
|
|
12 |
|
13 |
|
14 |
## Model Details
|
@@ -21,7 +22,6 @@ With a pipeline of expert models, we can infer spatial relationships between obj
|
|
21 |
|
22 |
- **Developed by:** remyx.ai
|
23 |
- **Model type:** MultiModal Model, Vision Language Model, Prismatic-vlms, Llama 3.1
|
24 |
-
- **License:** Apache-2.0
|
25 |
- **Finetuned from model:** Llama 3.1
|
26 |
|
27 |
### Model Sources
|
|
|
8 |
|
9 |
# Model Card for SpaceLLaVA
|
10 |
|
11 |
+
**SpaceLlama3.1** uses [llama3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) as the llm backbone along with the fused DINOv2+SigLIP features of [prismatic-vlms](https://github.com/TRI-ML/prismatic-vlms).
|
12 |
+
Uses a full fine-tune on the [spacellava dataset](https://huggingface.co/datasets/remyxai/vqasynth_spacellava) designed with [VQASynth](https://github.com/remyxai/VQASynth/tree/main) to enhance spatial reasoning as in [SpatialVLM](https://spatial-vlm.github.io/).
|
13 |
|
14 |
|
15 |
## Model Details
|
|
|
22 |
|
23 |
- **Developed by:** remyx.ai
|
24 |
- **Model type:** MultiModal Model, Vision Language Model, Prismatic-vlms, Llama 3.1
|
|
|
25 |
- **Finetuned from model:** Llama 3.1
|
26 |
|
27 |
### Model Sources
|