Model Card for Model ID

Llama3-8B-1.58 Models

This model was converted to GGUF format from HF1BitLLM/Llama3-8B-1.58-100B-tokens using llama.cpp.

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo brunopio/Llama3-8B-1.58-100B-tokens-GGUF --hf-file Llama3-8B-1.58-100B-tokens-GGUF -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo brunopio/Llama3-8B-1.58-100B-tokens-GGUF --hf-file Llama3-8B-1.58-100B-tokens-GGUF -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo brunopio/Llama3-8B-1.58-100B-tokens-GGUF --hf-file Llama3-8B-1.58-100B-tokens-GGUF -p "The meaning to life and the universe is"

or

./llama-server --hf-repo brunopio/Llama3-8B-1.58-100B-tokens-GGUF --hf-file Llama3-8B-1.58-100B-tokens-GGUF -c 2048
Downloads last month
1,645,732
Safetensors
Model size
2.8B params
Tensor type
BF16
ยท
U8
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for brunopio/Llama3-8B-1.58-100B-tokens-GGUF

Spaces using brunopio/Llama3-8B-1.58-100B-tokens-GGUF 3