Loading…
In-person + Virtual
October 24-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Daylight Time (UTC -4). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Friday, October 28 • 2:55pm - 3:30pm
Preventing Controller Sprawl From Taking Down Your Cluster - When a Scalable Pattern Stops Being Scalable - Madhu C.S., Robinhood Markets

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.


The vast majority of Kubernetes controllers make use of a WATCH and UPDATE pattern, which is a highly scalable client-pull based pattern. “Highly” does not mean “infinite”, and the spread of this pattern has led to a number of implicit design guarantees that operators build on. In this talk, the Container Orchestration team at Robinhood will cover the exploration of the boundaries of this pattern, how second order effects result in service degradation in production, and best practices for monitoring, detecting, debugging and addressing these issues. With examples drawn from real outages, the team will present lessons learned for organizations of all sizes.

Speakers
avatar for Madhu C.S.

Madhu C.S.

Software Engineer, Robinhood Markets
Madhu is the Tech Lead for the Software Platform - Container Orchestration team at Robinhood which is responsible for developing and operating the Kubernetes-based compute infrastructure for all of Robinhood. Prior to Robinhood, Madhu worked at Google for 8 years where he developed... Read More →



Friday October 28, 2022 2:55pm - 3:30pm EDT
250 ABC
  Reliability + Operational Continuity