12-07, 16:00–17:30 (UTC), General Track
High performance computing has been a key tool for computational researchers for decades. More recently, cloud economics and the intense demand for running AI workloads has led to a convergence of older, established standards like MPI and a desire to run them on modern cloud frameworks like Kubernetes. In this tutorial, we will discuss the historical arc of massively parallel computation, focusing on how modern cloud frameworks like Kubernetes can both serve data scientists looking to build production-grade applications and run HPC-style jobs like MPI programs and distributed AI training. Moreover, we will show practical examples of submitting these jobs in a few lines of Python code.
This tutorial will explore the intersection of HPC and the cloud. We will walk through a brief history of parallel computation, emphasizing the relationships between established standards like MPI, more recent trends like distributed AI training, and recent innovations in cloud infrastructure. Data scientists will discover how tools like AWS Batch and Kubernetes can help them create robust, production-ready applications while affording an infrastructure stack that can help them manage serious HPC workloads and distributed AI training in the cloud. Engineers familiar with cloud infrastructure will get a crash course in patterns that have been powering computational sciences for decades, and will see how to use tools they know well to enable these patterns.
No previous knowledge expected
I am a data scientist with a background in applied math, and experience working in a variety of customer-facing and R&D roles. Over the last six years I have worked at startups and at Intel helping customers and open-source communities use machine learning software.