Hands-on workshop

Virtual

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Explore how NVIDIA Dynamo enables disaggregated, high-performance inference using TensorRT-LLM, vLLM, and SGLang. This session covers core architectural patterns, KV cache strategies, and routing optimizations required to scale LLM workloads efficiently in production environments.

Jun 22, 2026

13:00

CEST

Deploying large language models is easy - scaling them efficiently is not. This session breaks down NVIDIA Dynamo into clear, practical concepts to help you build high-performance inference pipelines. We’ll explore how different backends like TensorRT-LLM, vLLM, and SGLang fit into modern architectures, and simplify advanced ideas like disaggregated inference, KV cache management, and smart routing. You’ll learn how these components impact latency, throughput, and GPU utilization in real-world deployments. Whether you're experimenting or moving to production, this workshop equips you with the mental models and strategies needed to design efficient, scalable LLM inference systems.

Virtual

Register for PlatformCon 2026