Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale
This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale
This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
Panelist

Achyut Sarma Boggaram
Senior Machine Learning Engineer, Torc AI
Panelist

Panelist

Moderator

Shashidhar Shenoy
Tech Lead, Google
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale
This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
As AI adoption continues to grow, managing resource efficiency and costs in cloud-native environments becomes increasingly critical. Shashidhar Shenoy and Achyut Sarma Boggaram will discuss the potential of model pruning as an optimization technique and its integration with Kubernetes-native tools. They will cover strategies for resource scheduling, autoscaling configurations, and best practices for deploying pruned AI models in Kubernetes environments. While model pruning is still an emerging practice for AI inference in the cloud, this session will examine its benefits, trade-offs, and technical considerations, providing valuable insights for platform teams seeking to optimize AI workloads. Attendees will gain practical knowledge on how to scale AI applications more efficiently while reducing resource usage and associated costs.
Optimizing AI workloads in Kubernetes: Pruning for efficiency and scale
This session will explore model pruning techniques and Kubernetes-native strategies for optimizing resource-intensive AI workloads, focusing on efficient scheduling, autoscaling, and inference serving in cloud environments.
Panelist

Achyut Sarma Boggaram
Senior Machine Learning Engineer, Torc AI
Panelist

Panelist

Host

Shashidhar Shenoy
Tech Lead, Google
Sign up now

