We are introducing multi-backend support in Hugging Face Text Generation Inference! With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).
We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned π€ !
Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development. The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!
These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - π @rita3ko@mchenco@jtkipp@nkothariCF@philschmid