Talk

Virtual

GPU as a product: Fair-share GPUs with preemption on Kubernetes

Most AI platforms fail at the boring part: fair GPU access. This talk shows how to turn a GPU fleet into a self-serve product with SKUs, fair-share queues, safe preemption, and cost accounting using Kubernetes + Kueue.

CEST

Zaid Fakhruddin presents a practical blueprint for "GPU as a product" for platform teams supporting GPU-backed AI/ML workloads: predictable access, high utilization, and explainable spend.

He walks through how to build a fair-share GPU platform on Kubernetes:
• Define 3 GPU SKUs (interactive, batch, training) with SLAs (TTL, checkpoint required, preemptible)
• Model tenants with Kueue ClusterQueues, cohorts, weights, nominal quota, and fair-sharing preemption
• Add hard guardrails: PriorityClasses, admission policies, and quota and budget caps per team
• Reduce fragmentation with MIG where it helps, and measure it with DCGM metrics

Takeaways include the exact Kubernetes objects and policies to create, the 5 metrics to track (queue time, utilization, fragmentation, job completion time, cost per team), and a rollout plan that platform teams can ship.

Virtual

Register for PlatformCon 2026