In-person + Virtual
October 24-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Daylight Time (UTC -4). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Friday, October 28 • 4:00pm - 4:35pm
Machine Learning Using Various GPU Technology With Kubeflow. - Jihye Choi, SAMSUNG SDS

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Everyone who works in MLOps tends to have a perception that limited cost and GPU is crucial. Kubeflow is a great open source, but it provides very little elements to handle efficient distributed learning through coupling tightly with GPU or by maximizing GPU utilization. 1. A simplified model uses a relatively small amount of GPU, as using the entire GPU capacity is considered as waste of resources. The Multi-Instance GPU applied to the NVIDIA A100 provides a technology that splits one GPU into up to 7 instances, and this presentation shows how to combine this top-notch technology with Kubeflow. 2. As the size of the model increases, distributed training becomes more necessary when using multiple GPU servers for efficiency. GPUDirect RDMA is a high-performance networking technology that directly communicates and processes GPU memory without CPU and system memory intervention. As a result, you can get tried and true experience, which improves GPU utilization and performance in Kubeflow.

avatar for Jihye Choi

Jihye Choi

Cloud Architect, SAMSUNG SDS
I have been a infrastructure architect and cloud software architect at Samsung SDS for over 10 years and have leveraged technology to match two major tech trends. Prior to 2019, I had run the Samsung-Card system for about 7 years as a system engineer. And now, I'm in charge of developing... Read More →

Friday October 28, 2022 4:00pm - 4:35pm EDT
252 AB Huntington Place: 1 Washington Blvd, Detroit, MI 48226
  Machine Learning + Data