FLAGSHIP WEEK

WORLD TOUR

VIEW TALKS

Hands-on workshop

LiveDay NYC

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Explore how NVIDIA Dynamo enables disaggregated, high-performance inference using TensorRT-LLM, vLLM, and SGLang. This session covers core architectural patterns, KV cache strategies, and routing optimizations required to scale LLM workloads efficiently in production environments.

Register

Jun 25, 2026

11:30

EDT

mins

Platform Pitt

Meet the speakers

Mayank Debnath

Director, Developer Relations

Mayank Debnath

Director, Developer Relations

Deploying large language models is easy - scaling them efficiently is not. This session breaks down NVIDIA Dynamo into clear, practical concepts to help you build high-performance inference pipelines. We’ll explore how different backends like TensorRT-LLM, vLLM, and SGLang fit into modern architectures, and simplify advanced ideas like disaggregated inference, KV cache management, and smart routing. You’ll learn how these components impact latency, throughput, and GPU utilization in real-world deployments. Whether you're experimenting or moving to production, this workshop equips you with the mental models and strategies needed to design efficient, scalable LLM inference systems.

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Register

Meet the speakers

Mayank Debnath

Director, Developer Relations

Mayank Debnath

Director, Developer Relations

Jun 25, 2026

Register for the

workshop

Navigation

Live Day Paris

Live Day São Paulo

Live Day Sydney

Live Day SF & Valley

Past years

PlatformCon 2025

PlatformCon 2024

PlatformCon 2023

PlatformCon 2022

Join us

Youtube

LinkedIn

Platform Weekly

All rights reserved.

Powered by

x

Navigation

Live Day Paris

Live Day São Paulo

Live Day Sydney

Live Day SF & Valley

Past years

PlatformCon 2025

PlatformCon 2024

PlatformCon 2023

PlatformCon 2022

Join us

Youtube

LinkedIn

Platform Weekly

All rights reserved.

Powered by

x