PyData Global 2023

Elijah ben Izzy

Elijah built large components of the simulation/trading infrastructure at Two Sigma, and led a team to test/ensure the reliability of their quantitative code. He then built out the ML platform at Stitch Fix that was used by 100+ data scientists (see https://multithreaded.stitchfix.com/blog/2022/07/14/deployment-for-free/). Most recently he co-authored the open source library Hamilton, a general-purpose lightweight framework for building dataflows in Python. Due to the success/possibilities presented by Hamilton, he left his job at Stitch Fix and started DAGWorks, with the goal of making it easy for Data Scientists to build and manage machine learning ETLs.

The speaker's profile picture

Sessions

12-07
20:00
30min
Bridging Classic ML Pipelines with the World of LLMs
Elijah ben Izzy, Stefan Krawczyk

You probably don’t need a fancy new tool to take advantage of LLMs. While the explosion of inventive AI applications feels like a massive leap forward, the core challenges in plugging them into the business represent an incremental step from the discipline of MLOps.

The challenges are largely equivalent. Retrieval augmented generation is effectively a recommendation system. Agents are the control flow of your program. Chains of LLM calls are simple DAGs. And you’re still stuck trying to monitor quantitatively unclear predictions, wrestle expensive, unstable APIs into submissions, and build out and manage complex dataflows.

The toolbox, as well, remains similar. In this talk we present the library Hamilton, an open source microframework for expressing dataflows in python. We show how it can help you build observable, stable, context-independent pipelines that span the gamut from classical ML to LLMs/RAG, enabling you to maintain sanity and keep up with the pace of change as everyone steps into the fascinating new world of AI.

Machine Learning Track
Machine Learning Track