PROGRAM

LIVE DAY LDN

LIVE DAY NYC

CLOSING PARTY

CALL FOR PROPOSALS

WORLD TOUR

Showcase

Virtual

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Explore how NVIDIA Dynamo enables disaggregated, high-performance inference using TensorRT-LLM, vLLM, and SGLang. This session covers core architectural patterns, KV cache strategies, and routing optimizations required to scale LLM workloads efficiently in production environments.

Register

Jun 22, 2026

13:00

CEST

Meet the speakers

Mayank Debnath

Director, Developer Relations

Mayank Debnath

Director, Developer Relations

Deploying large language models is easy - scaling them efficiently is not. This session breaks down NVIDIA Dynamo into clear, practical concepts to help you build high-performance inference pipelines. We’ll explore how different backends like TensorRT-LLM, vLLM, and SGLang fit into modern architectures, and simplify advanced ideas like disaggregated inference, KV cache management, and smart routing. You’ll learn how these components impact latency, throughput, and GPU utilization in real-world deployments. Whether you're experimenting or moving to production, this workshop equips you with the mental models and strategies needed to design efficient, scalable LLM inference systems.

Designing high-throughput LLM inference pipelines with NVIDIA Dynamo

Register

Meet the speakers

Mayank Debnath

Director, Developer Relations

Mayank Debnath

Director, Developer Relations

Register for the workshop

Navigation

Live Day Paris

Live Day São Paulo

Live Day Sydney

Past years

PlatformCon 2025

PlatformCon 2024

PlatformCon 2023

PlatformCon 2022

Join us

Youtube

LinkedIn

Platform Weekly

All rights reserved.

Powered by

x

Navigation

Live Day Paris

Live Day São Paulo

Live Day Sydney

Past years

PlatformCon 2025

PlatformCon 2024

PlatformCon 2023

PlatformCon 2022

Join us

Youtube

LinkedIn

Platform Weekly

All rights reserved.

Powered by

x