Loading…
In-person + Virtual
October 24-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Daylight Time (UTC -4). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Wednesday, October 26 • 2:30pm - 3:05pm
Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano - Krzysztof Adamski & Tinco Boekestijn, ING

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.


Three years ago ING Wholesale Banking Advanced Analytics team set up an ambitious goal to gather in one place a curated portfolio of internal data sources together with a large scale compute platform. At its core the idea of allowing internal projects to get access to a rich toolset of open source and industry standards frameworks and preprocessed data to validate business ideas in the secure exploration environment. Extensive growth with over 300 internal projects so far and more than 2000 internal users proofs advanced analytics i.e. ML, AI, NLP capabilities should become easily consumable not only by specialized, dedicated teams, but make them close to subject matter experts. In this session we would like to shed more light on how a specialized cloud native Kubernetes scheduler (Volcano) enables us to deliver multi-tenant large scale processing capabilities. The optimal resource usage with stability of core services are key for our cloud native platform. To enable dynamic allocation and hdrf (hierarchical dominant resource fairness) we have created an extension to Apache Spark binaries. This allows users to use Volcano with Spark interactive mode in a Jupyter notebook. Additionally we have created interfaces to visualize all the scheduling metrics like the yarn ui.

Speakers
avatar for Krzysztof Adamski

Krzysztof Adamski

Tech Lead at ING Data Analytics Platform, ING
Enables companies to make use of data to fuel their product strategies. Closing the gap between IT infrastructure teams and business initiatives. With the experience in high frequency trading business, migrating Spotify biggest analytics cluster to the public cloud and rebellious... Read More →
TB

Tinco Boekestijn

Data Engineer, ING
Full time Data engineer at ING for 2.5 years with a passion for distributed computing and scheduling. Studied at the University of Groningen. Living in The Netherlands. Active and devoted climber. As a fun side project I yearly solve (difficult) puzzles with Prolog, since Prolog is... Read More →



Wednesday October 26, 2022 2:30pm - 3:05pm EDT
Portside Ballroom (Room 260)
  Customizing + Extending Kubernetes