Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
BST
EDT

Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale

This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Talk
Sponsored
Thu 26 June
Virtual
Virtual
Virtual
On demand

Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale

This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
Thu 26 June
EDT time
EDT
CEST
EDT
BST
Presented by
Panelist
Achyut Sarma Boggaram
Senior Machine Learning Engineer, Torc AI
Panelist
Panelist
Moderator
Shashidhar Shenoy
Tech Lead, Google
Tell everyone
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
Thu 26 June

Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale

This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
CEST
BST
EDT
Duration:
90min
60min
Presented by
Tell everyone
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
BST
EDT

Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale

This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
Presented by
Panelist
Achyut Sarma Boggaram
Senior Machine Learning Engineer, Torc AI
Panelist
Panelist
Host
Shashidhar Shenoy
Tech Lead, Google
Tell everyone
Sign up now