PyData Global 2023

Who needs ChatGPT? Rock solid AI pipelines with Hugging Face and Kedro
12-07, 16:00–17:30 (UTC), Machine Learning Track

Artificial Intelligence is all the rage, largely thanks to generative systems like ChatGPT, Midjourney, and the like. These commercial systems are very sophisticated and powerful, but also a bit opaque if you want to learn how they work or adapt them to your needs. What happens inside the 'black box'?

Luckily there are open AI models that you can download comfortably, study without restrictions, and adjust so that they do what you want. This requires some technical knowledge, but thanks to Hugging Face's models and their ecosystem of Python libraries, delving into AI is easier than ever.

You will soon find yourself combining different models, performing different tasks, and creating complex systems. But this complexity can grow very quickly, and soon you'll find yourself with spaghetti code if you are not careful. By using the Kedro catalog and Kedro pipelines, you will be able to organize the code in no time.

In this tutorial you will learn how to create a complex AI pipeline using Hugging Face transformers, turn it into a Kedro project that cleanly separates code from configuration and data, and deploy it to production so it starts delivering value.

To that end, we will build a system that summarizes and classifies social media posts using several Hugging Face pre-trained models.

The outline will be as follows:

  1. Introduction (5m)
  2. Who needs ChatGPT? Commercial vs open-source AI (5m)
  3. Fighting spaghetti data science with Kedro (15m)
  4. Using Hugging Face models (15m)
  5. Separating code from data using the Kedro catalog (10m)
  6. Refactoring the code using Kedro pipelines (20m)
  7. Deploying to production (15m)
  8. Conclusions

Prior Knowledge Expected

Previous knowledge expected

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He has a decade of experience as developer advocate, software engineer, and Python trainer in several industries, and currently he works as Principal Product Manager for Kedro, an open source Python framework for data science, at QuantumBlack, AI by McKinsey.

He has made significant contributions to the PyData stack and published several open-source packages, the most important one being poliastro, an open-source Python library for orbital mechanics used at space agencies, satellite companies, and universities.

After founding the Python España non-profit and co-organizing the first seven PyCons in Spain, he became a Python Software Foundation Fellow in 2017. Nowadays he is the lead organizer of the PyData Madrid monthly meetups.