We rewrote tsfresh in Polars and why you should too
tsfresh is a popular time-series feature extraction library with over 7500 stars and thousands of downloads per day. tsfresh, however, is over 6 years old and suffers from slow performance and an outdated API. These features describe key characteristics of the time-series using algorithms from statistics, econometrics, signal processing, and non-linear dynamics.
That's why we open-sourced functime: a new high-performance time-series machine-learning library. What makes functime special is it's written in the ground-up with polars, which is currently the world's fastest dataframe library built on Apache Arrow and Rust.
functime recently rewrote 100s of features from tsfresh into Polars. The result? Up to 50x improvement in speed and memory efficiency compared to existing Pandas / Numpy implementations. functime is now the world's fastest time-series feature extraction library. Moreover, functime effortlessly parallelizes work for thousands of time-series using Polar's highly-optimized Rayon backend,. No distributed cluster (e.g. Spark). needed!
This talk begins with a brief introduction of time-series feature extraction and its use-cases. We then deep dive into the reasons why Polars is an optimal query engine for time-series feature engineering. We discuss the challenges and learnings from our rewrite. In particular, we will demonstrate, through code and benchmarks, lesser-known Polars tips and tricks to squeeze 10x speedups in your data engineering workflows.