Hands-on workshop

LiveDay NYC

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Explore how NVIDIA Dynamo enables disaggregated, high-performance inference using TensorRT-LLM, vLLM, and SGLang. This session covers core architectural patterns, KV cache strategies, and routing optimizations required to scale LLM workloads efficiently in production environments.

Jun 25, 2026

14:00

EDT

Meet the speakers

Mayank Debnath
Director, Developer Relations

Deploying large language models is easy - scaling them efficiently is not. This session breaks down NVIDIA Dynamo into clear, practical concepts to help you build high-performance inference pipelines. We’ll explore how different backends like TensorRT-LLM, vLLM, and SGLang fit into modern architectures, and simplify advanced ideas like disaggregated inference, KV cache management, and smart routing. You’ll learn how these components impact latency, throughput, and GPU utilization in real-world deployments. Whether you're experimenting or moving to production, this workshop equips you with the mental models and strategies needed to design efficient, scalable LLM inference systems.

Register for the workshop