FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards. Update vLLM and Transformers:
pip install vllm>=0.7.2
pip install git+https://github.com/huggingface/transformers
Then run with:
vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code
- Downloads last month
- 310
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic
Base model
Qwen/Qwen2.5-VL-7B-Instruct