12-06, 16:00–16:30 (UTC), Data Track
Everybody knows Polars revolutionised the dataframe landscape, yet fewer realise that machine learning is next. Thanks to its extreme speed, we can speed up feature engineering by 1-2 orders of magnitude. The true gains, however, span across the whole ML lifecycle, with significantly faster batch inference and effortless scaling (no PySpark required!).
Add a best-of-the-class set of tools for feature extraction, model evaluation and diagnostic visualisations and you'll get functime: a next-generation library for ML forecasting. Though time-series practitioners are the primary audience, there's something for all data scientists. It's not just forecasting: it's about building the next generation of machine learning libraries.
Polars is mature, production ready, intuitive to write and pleasant to read. And it's fast. Thanks to Rust and Rayon, you can achieve speeds greater than numba's. Though not a tensor library, Polars can still speed up your machine learning workflows dramatically.
We chose to write a time-series library first, because forecasting with panel datasets usually required fitting thousands of univariate time series at a time with distributed systems. However, recent advances in the literature and forecasting competitions showed that, with sufficiently large panel datasets, global forecasting models can outperform local ones.
Thanks to Polars' truly multithreaded query engine, we can quickly perform embarrassingly parallel operations on big panel datasets, such as feature extraction and time-series cross-validation. All of this without a multi-node cluster.
Intended audience. This talk is not math heavy and is designed for forecasting practitioners and data scientists alike. We aim to showcase a forecasting library with a modern, functional API that time-series experts can use to forecast thousands of time-series without distributed systems.
However, the principles behind functime can be grasped by every machine learning practitioner: forecasting is just a use-case to shows off Polars' potential. With Polars, we can improve the current state of machine learning modelling, with modern APIs and truly parallel engines that work at reasonable scales without multi-node clusters.
Talk outline
• minutes 0-3. Expose the goal of the talk.
• minutes 3-7. Why is Polars so fast?
• minutes 7-12. Problem setting: forecasting at scale. Panel data, local and global forecasting.
• minutes 12-25. functime overview: the extract-fit-evaluate modelling workflow functional API.
• minutes 25-30. Benchmarks against available libraries (sklearn, StatsForecasts and MLForecasts)
No previous knowledge expected
ML Engineer interested in forecasting