OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8
When trying to run Llama3.2-3B-Instruct-QLORA_INT4_EO8
, I'm getting the error:
OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
I've tried using transformers pipeline
and also AutoModelForCausalLM
to pull the model but get the error in both cases.
The weights were uploaded in their "original" (meta) format, and they need to be translated to the HuggingFace format to be used with the pipelines. I'm sure they will upload the reformatted version soon.
Following this. (I thought I was missing something esoteric and GGUF related at first :D)
Anything new on this or examples for a library that can work with this?
Any solution ?
there is excutorch run sample here. https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md.
but it runs after converted to .pte kind of binary file. it is hard to look into how the model actually works..
still dont know how to run the model on gpu likie hf format