Update README.md
Browse files
README.md
CHANGED
@@ -63,13 +63,21 @@ Features of this architecture:
|
|
63 |
|
64 |
### Step 1: Environment Setup
|
65 |
|
66 |
-
Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies,
|
|
|
|
|
67 |
|
68 |
```
|
69 |
wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
|
70 |
bash setup.sh
|
71 |
```
|
72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
|
74 |
### Step 2: Chat with Hymba-1.5B-Base
|
75 |
After setting up the environment, you can use the following script to chat with our Model
|
@@ -88,7 +96,7 @@ model = model.cuda().to(torch.bfloat16)
|
|
88 |
# Chat with Hymba
|
89 |
prompt = input()
|
90 |
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
|
91 |
-
outputs = model.generate(**inputs, max_length=64, do_sample=
|
92 |
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
93 |
|
94 |
print(f"Model response: {response}")
|
|
|
63 |
|
64 |
### Step 1: Environment Setup
|
65 |
|
66 |
+
Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment:
|
67 |
+
|
68 |
+
- **[Local install]** Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4):
|
69 |
|
70 |
```
|
71 |
wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
|
72 |
bash setup.sh
|
73 |
```
|
74 |
|
75 |
+
- **[Docker]** A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
|
76 |
+
```
|
77 |
+
docker pull ghcr.io/tilmto/hymba:v1
|
78 |
+
docker run --gpus all -v /home/$USER:/home/$USER -it ghcr.io/tilmto/hymba:v1 bash
|
79 |
+
```
|
80 |
+
|
81 |
|
82 |
### Step 2: Chat with Hymba-1.5B-Base
|
83 |
After setting up the environment, you can use the following script to chat with our Model
|
|
|
96 |
# Chat with Hymba
|
97 |
prompt = input()
|
98 |
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
|
99 |
+
outputs = model.generate(**inputs, max_length=64, do_sample=False, temperature=0.7, use_cache=True)
|
100 |
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
101 |
|
102 |
print(f"Model response: {response}")
|