Better quants based on the f16 available here: https://huggingface.co/qwp4w3hyb/Cerebrum-1.0-8x7b-iMat-GGUF

Model Card for Cerebrum-1.0-8x7b-imatrix-GGUF

Quantized from https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b using llama.cpp commit 46acb3676718b983157058aecf729a2064fc7d34 utilizing an importance matrix.

Quants will be upload with slow german internet so they will appear 1 by 1, stay tuned.

imatrix generated with:

./imatrix -ofreq 4 -b 512 -c 512 -t 14 --chunks 24 -m ../models/Cerebrum-1.0-8x7b-GGUF/cerebrum-1.0-8x7b-Q8_0.gguf -f ./groups_merged.txt

with the dataset from here: https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

Sadly this means the imatrix is generated from the Q8 instead of the unquantized f16, like it should be, sadly I can't get it to work with the f16 on my machine at the moment. It should still improve the performance of the quants though.

Downloads last month
13
GGUF
Model size
46.7B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.