Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
BST
EDT

Resilient on-premises AI workloads on Kubernetes with hyperconverged infrastructure

This session will explore how platform engineers can build resilient on-premises infrastructure for AI workloads on OpenShift. It will cover best practices in networking, storage, and compute, as well as strategies for backup, disaster recovery, and automation to ensure high availability and operational efficiency.
As AI workloads continue to grow in complexity and demand, platform engineers are tasked with building resilient, scalable infrastructure. This talk will focus on deploying OpenShift clusters on hyperconverged infrastructure (HCI), ensuring the high availability of workloads and enhancing operational efficiency. Hyperconverged infrastructure integrates compute, storage, and networking into a single system, simplifying management and improving performance. Shajeer Mohammed will discuss how to design a fault-tolerant system with multiple servers and networks, eliminating single points of failure. In particular, the session will explore the role of Software-defined Storage (SDS) in providing scalability, resilience, and seamless data access for AI workloads. In addition to infrastructure design, ensuring business continuity is crucial. The session will cover the implementation of backup policies and disaster recovery plans, focusing on minimizing downtime and protecting data. Strategies such as DR protections will be discussed to safeguard against data loss in the event of a disaster. Attendees will also compare the benefits and trade-offs of running workloads on bare metal versus virtual machines, with an emphasis on performance and reliability. The talk will include guidance on using automated monitoring, alerting tools, firmware upgrades, auto-scaling, and proactive issue resolution to streamline day-two operations.
Talk
Sponsored
Fr 27 June
Virtual
Virtual
Virtual
On demand

Resilient on-premises AI workloads on Kubernetes with hyperconverged infrastructure

This session will explore how platform engineers can build resilient on-premises infrastructure for AI workloads on OpenShift. It will cover best practices in networking, storage, and compute, as well as strategies for backup, disaster recovery, and automation to ensure high availability and operational efficiency.
Fr 27 June
EDT time
EDT
CEST
EDT
BST
Presented by
Panelist
Panelist
Panelist
Moderator
Shajeer Mohammed
Lead Architect-STSM, Spectrum Fusion
Tell everyone
As AI workloads continue to grow in complexity and demand, platform engineers are tasked with building resilient, scalable infrastructure. This talk will focus on deploying OpenShift clusters on hyperconverged infrastructure (HCI), ensuring the high availability of workloads and enhancing operational efficiency. Hyperconverged infrastructure integrates compute, storage, and networking into a single system, simplifying management and improving performance. Shajeer Mohammed will discuss how to design a fault-tolerant system with multiple servers and networks, eliminating single points of failure. In particular, the session will explore the role of Software-defined Storage (SDS) in providing scalability, resilience, and seamless data access for AI workloads. In addition to infrastructure design, ensuring business continuity is crucial. The session will cover the implementation of backup policies and disaster recovery plans, focusing on minimizing downtime and protecting data. Strategies such as DR protections will be discussed to safeguard against data loss in the event of a disaster. Attendees will also compare the benefits and trade-offs of running workloads on bare metal versus virtual machines, with an emphasis on performance and reliability. The talk will include guidance on using automated monitoring, alerting tools, firmware upgrades, auto-scaling, and proactive issue resolution to streamline day-two operations.
Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
Fr 27 June

Resilient on-premises AI workloads on Kubernetes with hyperconverged infrastructure

This session will explore how platform engineers can build resilient on-premises infrastructure for AI workloads on OpenShift. It will cover best practices in networking, storage, and compute, as well as strategies for backup, disaster recovery, and automation to ensure high availability and operational efficiency.
CEST
BST
EDT
Duration:
90min
60min
Presented by
Tell everyone
As AI workloads continue to grow in complexity and demand, platform engineers are tasked with building resilient, scalable infrastructure. This talk will focus on deploying OpenShift clusters on hyperconverged infrastructure (HCI), ensuring the high availability of workloads and enhancing operational efficiency. Hyperconverged infrastructure integrates compute, storage, and networking into a single system, simplifying management and improving performance. Shajeer Mohammed will discuss how to design a fault-tolerant system with multiple servers and networks, eliminating single points of failure. In particular, the session will explore the role of Software-defined Storage (SDS) in providing scalability, resilience, and seamless data access for AI workloads. In addition to infrastructure design, ensuring business continuity is crucial. The session will cover the implementation of backup policies and disaster recovery plans, focusing on minimizing downtime and protecting data. Strategies such as DR protections will be discussed to safeguard against data loss in the event of a disaster. Attendees will also compare the benefits and trade-offs of running workloads on bare metal versus virtual machines, with an emphasis on performance and reliability. The talk will include guidance on using automated monitoring, alerting tools, firmware upgrades, auto-scaling, and proactive issue resolution to streamline day-two operations.
Talk
Sponsored
Virtual
LiveDay NYC
LiveDay LDN
On demand
BST
EDT

Resilient on-premises AI workloads on Kubernetes with hyperconverged infrastructure

This session will explore how platform engineers can build resilient on-premises infrastructure for AI workloads on OpenShift. It will cover best practices in networking, storage, and compute, as well as strategies for backup, disaster recovery, and automation to ensure high availability and operational efficiency.
Presented by
Panelist
Panelist
Panelist
Host
Shajeer Mohammed
Lead Architect-STSM, Spectrum Fusion
Tell everyone
Sign up now