view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 27 days ago • 64
view article Article Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive Jan 15, 2024 • 4
view article Article AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU Dec 5, 2023 • 2
view article Article Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code Dec 5, 2023 • 4
view article Article Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs Jan 13, 2022 • 2
view article Article Introducing Optimum: The Optimization Toolkit for Transformers at Scale Sep 14, 2021 • 1