Idefics3-8B-Llama3-bnb_nf4

BitsAndBytes NF4 quantization.

Quantization

Quantization created with:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "HuggingFaceM4/Idefics3-8B-Llama3"

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["lm_head", "model.vision_model", "model.connector"],
    )

model_nf4 = AutoModelForVision2Seq.from_pretrained(model_id, quantization_config=nf4_config)
Downloads last month
70
Safetensors
Model size
5.08B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for leon-se/Idefics3-8B-Llama3-bnb_nf4

Quantized
(3)
this model