PyData Global 2023

Full-stack Machine Learning and Generative AI for Data Scientists
12-07, 21:00–22:30 (UTC), Machine Learning Track

One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this tutorial, we present an introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows. We’ll cover a wide range of applications, including business-critical ML and data pipelines of today, as well as state-of-the-art generative AI and LLM use cases of tomorrow.


One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this tutorial, we present an introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows.

We’ll present a high-level overview of the 8 layers of the ML stack: data, compute, versioning, orchestration, software architecture, model operations, feature engineering, and model development. We’ll present a schematic as to which layers data scientists need to be thinking about and working with, and then introduce attendees to the tooling and workflow landscape. In doing so, we’ll present a widely applicable stack that provides the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure.

We’ll cover a wide range of applications, including business-critical ML and data pipelines of today, as well as state-of-the-art generative AI and LLM use cases of tomorrow.

Expected background knowledge:

  • programming fundamentals and the basics of the Python programming language (e.g., variables, for loops);
  • a bit about the PyData stack: numpy, pandas, scikit-learn, for example;
  • a bit about Jupyter Notebooks and Jupyter Lab;
  • your way around the terminal/shell.

Resources will be provided in a GitHub repository an interactive online sandbox.


Prior Knowledge Expected

Previous knowledge expected

Hugo Bowne-Anderson is Head of Developer Relations at Outerbounds. He is also the host of the industry podcast Vanishing Gradients. Hugo is a data scientist, educator, evangelist, content marketer, and data strategy consultant, with extensive experience at Coiled, a company that makes it simple for organizations to scale their data science seamlessly, and DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry.

This speaker also appears in: