Version Control and Beyond: Leveraging Git for ML Experiment Management PyData Global 2023

Version Control and Beyond: Leveraging Git for ML Experiment Management
.ical

12-06, 12:00–12:30 (UTC), Machine Learning Track

Before finalizing a machine learning model, data scientists conduct dozens, if not hundreds, of experiments. To keep track of these experiments, they employ setups of varying complexity, including physical notebooks, spreadsheets, or even complex configurations using various libraries and dedicated infrastructure. In this practical presentation, I will demonstrate how you and your team can start tracking experiments right away using a very simple setup, with most of the ingredients you are probably already using.

Currently, we often run dozens of machine learning experiments every day. While doing so, our primary focus naturally lies on performance metrics. However, ideally, we should also be able to track various aspects such as input data, code, model hyperparameters, and more. Without these, it could be extremely difficult, if not impossible, to fully reproduce these experiments.

In this practical presentation, I will demonstrate how you and your team can immediately start tracking experiments using a straightforward setup. All you need for this are Git, the VS Code IDE, and DVC.

This talk is intended for all data scientists, regardless of their experience level, as well as individuals interested in machine learning workflows. To follow the talk, you don't need in-depth ML expertise; basic familiarity with Python and scikit-learn will suffice. The best thing about this setup lies in its minimal impact on the project's codebase. You can start tracking and evaluating your experiments without a steep learning curve or extensive refactoring of the current codebase.

Prior Knowledge Expected –

No previous knowledge expected

Eryk Lewinson

Eryk is an experienced data scientist who specializes in practical applications of data science methods. Outside of work, he has written over a hundred articles on topics related to data science, which have been viewed more than 4 million times. Additionally, he has authored two books on the application of Python in the financial context, both of which were published by Packt.

Version Control and Beyond: Leveraging Git for ML Experiment Management .ical 12-06, 12:00–12:30 (UTC), Machine Learning Track

Version Control and Beyond: Leveraging Git for ML Experiment Management
.ical

12-06, 12:00–12:30 (UTC), Machine Learning Track