PyData Global 2023

Patrick Deziel

Patrick Deziel is a distributed systems engineer and machine learning specialist. Patrick has extensive experience building and maintaining mission-critical systems in the private sector, as well as integrating modern ML solutions into existing applications. At Rotational, he designs and builds intelligent distributed systems to enable global use cases. In his free time, Patrick enjoys rock climbing and consuming science fiction.

The speaker's profile picture

Sessions

12-07
14:30
30min
Event-Driven Data Science: Reconceptualizing Machine Learning for the Real-time World
Prema Roman, Patrick Deziel

Did you know that 87% of data science projects never make it into production? While open source libraries like scikit-learn and TensorFlow are have gone a long way to democratize data science, they are also unintentionally limited by the assumptions and research focus of academia at the time they were released. One such assumption is that a model must be trained on batches of data and that all machine learning models need more data in order to perform well. This introduces a gap between training and inference as there is a requirement to accumulate enough instances for training. For real-time use cases such as anomaly detectors, models can become stale even before they get deployed to production.

Fortunately there has been a trend towards building machine learning models that are geared towards learning from streams of data and that can react immediately to changes in data. This form of learning is usually referred to as real-time machine learning, online learning, or incremental learning.

In this talk, we will compare the two approaches to machine learning, provide a brief overview of River, a library for building online learning models, and demo a real-time application using PyEnsign, a real-time data streaming client.

Machine Learning Track
Machine Learning Track