PyData Global 2023

Production Data to the Model: “Are You Getting My Drift?”
12-08, 14:00–14:30 (UTC), Data Track

A shift is a poetic word for uncertainty. Winds shift, rivers and sands drift, and people change. Coming to the not-so-poetic world of data science, what about the data? Data comes from systems and people using them, so it is natural that data too will see the rigors of shift too. A model that was trained and tested for particular dynamics may assume the expected uncertainty in the data such as a shift in the user behavior. But what happens when the shift goes beyond expectations? How do teams detect the different types of data drift? More so, how do they tackle the detected drift? In this talk, I will gently introduce you to data drift and how the industry tackles this issue.


“Everything you see has its roots in the unseen world. The forms may change, yet the essence remains the same.”

When Rumi wrote this quote, machine learning was not even an idea. Therefore we can safely assume that these soothing words do not apply to the probability distribution of predictors and targets. The said distribution can change from something trivial like a change of collection metric system to something as disruptive as a pandemic. Keeping track of data drift has become an essential part of industrializing the machine-learning process. A simple mix of understanding the kind of data the model would encounter, mathematics, and a suitable detection strategy can help teams watch out for model performance decay. Additionally, it is important for data science practitioners to understand the types of drifts to devise the best detection strategy. My talk will focus on the following: Introduction to data drift and the cost of ignoring it Types of data drift Commonly-used tests to detect drift in numerical and categorical data A short Python-based walkthrough of detection methods How is drift detected for unstructured data like text? Drift happened and we caught it, now what?

The intended target audience is broad since anyone who has deployed or wishes to deploy their model needs to be aware of this issue.


Prior Knowledge Expected

No previous knowledge expected

Hello there! I am currently working as a Senior Data Scientist at Censius Inc.

My typical day at work involves:
✦ Research, prototyping and discussions on product features
✦ Product roadmap documentation
✦ Review media content and resources
✦ Pre-sales pitches

I will be defending my Ph.D. thesis soon 🤞🏼
Specialising in data privacy, my Ph.D. work dabbled with differential privacy and synthetic data.

I'm a mediocre runner who's a mom to two rescued dogs and one non-rescued human.