PyData Global 2023

Paradoxes in model training and evaluation under constraints
12-06, 12:30–13:00 (UTC), Machine Learning Track

In many domains, machine learning methods predict the future demand of some physical good or virtual service that comes with finite capacity. Those predictions are then typically used to plan an appropriate level of supply. Often, it is not possible to directly measure (and to train on) the actual demand, but only on the fraction of it that could be fulfilled under the given constraints in the past – such as finite stocks or limited capacity. That is, one predicts a different quantity than one measures. This talk explores the various surprising aspects of the demand-sales-distinction that can arise in data science projects. We explore the paradoxes and the most dramatic problems that one encounters and find out how to avoid them. This talk will sharpen your thoughts when dealing with such intricate settings, and allow you to create and utilize demand forecasts in the best possible way.


This talk will equip you – a Data Scientist or a person working with Data Scientists - with the background and tooling necessary to deal with models that predict demand in situations when only censored demand (e.g. sales that are constrained by the available stock level) are known. You will learn why the distinction between unconstrained demand and constrained sales is not just splitting hairs, but important to avoid a strong and systematic bias in your forecast. You will be equipped to ask the right questions and avoid falling into certain training and evaluation traps.

Constraints kick in when the actual demand for some product can’t be fulfilled due to its finite capacity. For example, when 10 customers want to buy a cake, but there are only 6 left. This needs to be accounted for in training: Training on a demand of 6 will give rise to a biased model that under-forecasts and prevents the vendor to stock up properly next time. Finite capacity also needs to be accounted in the evaluation: A forecast of 11 for that situation has certainly been better than one predicting 5, even though the latter is much closer to the observed actual. Using standard python libraries (numpy, scipy, pandas), we simulate various relevant situations, illustrate what can go wrong, and how to avoid it.


Prior Knowledge Expected

Previous knowledge expected

After pursuing his PhD and postdoc research in theoretical quantum physics, Malte joined Blue Yonder as a Data Scientist in 2015. Since then, he has led numerous external and internal projects, which all involved programming python, creating, working with and evaluating probabilistic predictions, and communicating the achieved results.