PyData Global 2023

Matthew Rocklin

Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.

Matthew holds a bachelors degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago.

The speaker's profile picture

Sessions

12-06
16:30
30min
Arrow revolution in pandas and Dask
Matthew Rocklin, Patrick Hoefler

The pandas library for data manipulation and data analysis is the most widely used open source data science software library. Dask is the natural extension for scaling pandas workloads to more than a single machine. The continuing integration and adoption of Apache Arrow accelerates historical bottlenecks in both libraries.

Data Track
Data Track