Lessons learned when taming Kubernetes for on-demand environments
At OLX Autos, we are running a homegrown Internal Developer Platform that leverages Kubernetes to spin up on-demand environments, each housing more than 180 microservices. Managing 10 to 200 environments or 20k+ pods has been an interesting journey for us, filled with unique challenges and opportunities. I'd like to share our learnings while running on-demand environments at scale on Kubernetes.
I'll talk about the challenges and solutions we encountered while running on-demand environments at scale on Kubernetes:
- The need for speed: Unlocking lightning-fast environment provisioning times
- Handling high pod churn:
- No weak links: Protecting environments from single points of failure
- Guardrails for k8s controllers: As we were managing multiple environments with numerous Kubernetes objects, we required a reliable and scalable solution to tackle this issue, since any downtime could render an entire environment inaccessible.
- Optimize Kubernetes for faster on-demand environment provisioning
- Identify and resolve potential points of failure while running on-demand environments on Kubernetes at scale
- How to run on-demand environments on spot instances with confidence