YongganFu commited on
Commit
936bb7f
·
verified ·
1 Parent(s): 2e0e6a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -63,13 +63,21 @@ Features of this architecture:
63
 
64
  ### Step 1: Environment Setup
65
 
66
- Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, please use the provided `setup.sh` (support CUDA 12.1/12.4) to install the related packages:
 
 
67
 
68
  ```
69
  wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
70
  bash setup.sh
71
  ```
72
 
 
 
 
 
 
 
73
 
74
  ### Step 2: Chat with Hymba-1.5B-Base
75
  After setting up the environment, you can use the following script to chat with our Model
@@ -88,7 +96,7 @@ model = model.cuda().to(torch.bfloat16)
88
  # Chat with Hymba
89
  prompt = input()
90
  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
91
- outputs = model.generate(**inputs, max_length=64, do_sample=True, temperature=0.7, use_cache=True)
92
  response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
93
 
94
  print(f"Model response: {response}")
 
63
 
64
  ### Step 1: Environment Setup
65
 
66
+ Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment:
67
+
68
+ - **[Local install]** Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4):
69
 
70
  ```
71
  wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
72
  bash setup.sh
73
  ```
74
 
75
+ - **[Docker]** A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
76
+ ```
77
+ docker pull ghcr.io/tilmto/hymba:v1
78
+ docker run --gpus all -v /home/$USER:/home/$USER -it ghcr.io/tilmto/hymba:v1 bash
79
+ ```
80
+
81
 
82
  ### Step 2: Chat with Hymba-1.5B-Base
83
  After setting up the environment, you can use the following script to chat with our Model
 
96
  # Chat with Hymba
97
  prompt = input()
98
  inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
99
+ outputs = model.generate(**inputs, max_length=64, do_sample=False, temperature=0.7, use_cache=True)
100
  response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
101
 
102
  print(f"Model response: {response}")