In-person + Virtual
October 24-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Daylight Time (UTC -4). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Thursday, October 27 • 3:25pm - 4:00pm
Kubernetes For GPU Powered Machine Learning Workloads In Academia - Camille Rodriguez, Canonical & John-Paul Robinson, University of Alabama at Birmingham

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

This talk aims to inform the architects and users of Kubernetes, as well as teams planning to transition for Kubernetes for research purposes, how we designed a high-performing Kubernetes cluster specifically geared towards machine learning and AI workloads. On the architectural side, the use of NVIDIA DGX A100 machines provides unprecedented compute density and performance for those workloads. Those nodes are integrated to the cluster with open-source software. We will also cover our challenges & successes in integrating to other components, such as external CEPH storage, gitlab registry and runners, and SAML authentication. The University of Alabama at Birmingham team will cover how they leverage container-enabled GPUs for their research and development workloads. Research workloads increasingly demand access to ad hoc, GPU-enable compute capacity, with complex software environments to power cloud-native workflows. K8s helps address needs ranging from regular ML training runs to supporting software development via CI pipelines.

avatar for John-Paul Robinson

John-Paul Robinson

HPC Architect, Manager, University of Alabama at Birmingham
John-Paul Robinson has been developing distributed systems and helping organizations adopt them for more than a decade. A long term proponent of open solutions and their ability to empower users, he is working to enhance the functionality of campus high performance computing environments... Read More →
avatar for Camille Rodriguez

Camille Rodriguez

Field Engineer Manager, Canonical
Camille works as a Field Engineering Manager at Canonical, where she designs and deploys private clouds, with technologies such as Kubernetes and Openstack, Juju Charms, MAAS, and other open-source integrations. She also leads the Women’s Resource Group at Canonical and is a hiring... Read More →

Thursday October 27, 2022 3:25pm - 4:00pm EDT
140 ABC
  Research + Academia