Hands-on workshop

Virtual

Large-scale distributed LLM inference with LLM-D and Kubernetes

LLM-D a Cloud Native Kubernetes based high-performance distributed LLM inference framework. Its architecture focuses on a well-lit path for serving llms at scale, with competitive performance for most models across a wide set of accelerators.

Jun 26, 2026

13:00

CEST

Running large language models (LLMs) locally for experimentation is easy, but running them in large-scale architectures is not. Businesses looking to integrate LLMs into critical paths must deal with the high costs and scarcity of GPU and TPU accelerators. Striking the right balance between performance, availability, scalability, and cost efficiency is essential.

While Kubernetes is a ubiquitous runtime for modern workloads, deploying LLM inference effectively demands a specialized approach. LLM-D is a cloud-native, Kubernetes-based, high-performance distributed LLM inference framework. Its architecture provides a clear path for anyone looking to serve at scale, with fast time to value and competitive performance per dollar for most models across a diverse set of hardware accelerators.

Virtual

Register for PlatformCon 2026