Host of the model
What host did you guy use to run this model? on-prem or cloud? what OS and hw configuration? Thanks.
You can run the smaller 1.5B model on your mac or windows machine with 16 gig memory easily.
What host did you guy use to run this model? on-prem or cloud? what OS and hw configuration? Thanks.
For the 671B version, you need hopper card like H20, H800, H100, at least one node.
can run SGLang & vLLM on windows and Intel i7 ultra without Nvidia GPU?
I always got below errors when install sglang and vllm using below commands
pip install "sglang[all]>=0.4.2.post4" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer/
ERROR: Could not find a version that satisfies the requirement sgl-kernel>=0.0.3.post3; extra == "srt" (from sglang[srt]) (from versions: 0.0.1)
ERROR: No matching distribution found for sgl-kernel>=0.0.3.post3; extra == "srt"
pip install vllm
copying build\lib\vllm\model_executor\layers\quantization\utils\configs\N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build\bdist.win-amd64\wheel\.\vllm\model_executor\layers\quantization\utils\configs
error: could not create 'build\bdist.win-amd64\wheel\.\vllm\model_executor\layers\quantization\utils\configs\N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json': No such file or directory
[end of output]