PyData Global 2023

Prema Roman

Prema Roman is a distributed systems engineer at Rotational Labs. She is an experienced software, data, and machine learning engineer with a proven track record of building high quality software applications and data products. Her passion for continuous learning has taken her a long way from her start as a data analyst, as she takes on new challenges at Rotational Labs building globally distributed systems and machine learning data products.

The speaker's profile picture

Sessions

12-07
14:30
30min
Event-Driven Data Science: Reconceptualizing Machine Learning for the Real-time World
Prema Roman, Patrick Deziel

Did you know that 87% of data science projects never make it into production? While open source libraries like scikit-learn and TensorFlow are have gone a long way to democratize data science, they are also unintentionally limited by the assumptions and research focus of academia at the time they were released. One such assumption is that a model must be trained on batches of data and that all machine learning models need more data in order to perform well. This introduces a gap between training and inference as there is a requirement to accumulate enough instances for training. For real-time use cases such as anomaly detectors, models can become stale even before they get deployed to production.

Fortunately there has been a trend towards building machine learning models that are geared towards learning from streams of data and that can react immediately to changes in data. This form of learning is usually referred to as real-time machine learning, online learning, or incremental learning.

In this talk, we will compare the two approaches to machine learning, provide a brief overview of River, a library for building online learning models, and demo a real-time application using PyEnsign, a real-time data streaming client.

Machine Learning Track
Machine Learning Track