PyData Global 2023

Quan Nguyen

Hi, I'm Quan Nguyen


Sessions

12-06
17:30
30min
But what is a Gaussian process? Regression while knowing how certain you are
Quan Nguyen

Given a test data point similar to the training points, we should expect the prediction of a machine learning model to be accurate.
However, we don't have the same guarantee for the prediction on the test point very far away from the training data, but many models offer no quantification of this uncertainty in our predictions.
These models, including the increasingly popular neural networks, produce a single-valued number as the prediction of a test point of interest, making it difficult to quantify how much the user should have trust in this prediction.

Gaussian processes (GPs) address this concern; a GP outputs as its prediction of a given a test point, instead of a single number, a probability distribution representing the range that the value we're predicting is likely to fall into.
By looking at the mean of this distribution, we obtain the most likely predicted value; by inspecting the variance of the distribution, we can quantify how uncertain we are about this prediction.
This ability to produce well-calibrated uncertainty quantification gives GPs an edge in high-stakes machine learning use cases such as oil drilling, drug discovery, and product recommendation.

While GPs are widely used in academic research in Bayesian inference and active learning tasks, many ML practitioners still shy away from it, believing that they need a highly technical background to understand and use GPs.
This talk aims to dispel that message and offers a friendly introduction to GPs, including its fundamentals, how to implement it in Python, and common practices.
Data scientists and ML practitioners who are interested in uncertainty quantification and probabilistic ML will benefit from this talk.
While most background knowledge necessary to follow the talk will be covered, the audience should be familiar with common concepts in ML such as training data, predictive models, multivariate normal distributions, etc.

Machine Learning Track
Machine Learning Track