nvidia
/

Hymba-1.5B-Base

Text Generation

Model card Files Files and versions Community

YongganFu commited on Nov 26, 2024

Commit

936bb7f

·

verified ·

1 Parent(s): 2e0e6a3

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -63,13 +63,21 @@ Features of this architecture:
 ### Step 1: Environment Setup
-Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, please use the provided `setup.sh` (support CUDA 12.1/12.4) to install the related packages:
 ```
 wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
 bash setup.sh
 ```
 ### Step 2: Chat with Hymba-1.5B-Base
 After setting up the environment, you can use the following script to chat with our Model
@@ -88,7 +96,7 @@ model = model.cuda().to(torch.bfloat16)
 # Chat with Hymba
 prompt = input()
 inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
-outputs = model.generate(**inputs, max_length=64, do_sample=True, temperature=0.7, use_cache=True)
 response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
 print(f"Model response: {response}")

 ### Step 1: Environment Setup
+Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment:
+- **[Local install]** Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4):
 ```
 wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
 bash setup.sh
 ```
+- **[Docker]** A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
+```
+docker pull ghcr.io/tilmto/hymba:v1
+docker run --gpus all -v /home/$USER:/home/$USER -it ghcr.io/tilmto/hymba:v1 bash
+```
 ### Step 2: Chat with Hymba-1.5B-Base
 After setting up the environment, you can use the following script to chat with our Model
 # Chat with Hymba
 prompt = input()
 inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
+outputs = model.generate(**inputs, max_length=64, do_sample=False, temperature=0.7, use_cache=True)
 response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
 print(f"Model response: {response}")