PyData Global 2023

Blazing fast I/O of data in the cloud with Daft Dataframes
12-06, 18:30–19:00 (UTC), Data Track

Daft (www.getdaft.io) is an open-sourced distributed Dataframe library, written in Rust but with a Python API. It features blazing fast cloud storage I/O with its Rust I/O layer, but all accessible via a familiar Python Dataframe interface. Load tens of thousands of CSV and Parquet files in seconds, all from the comfort of Python!


I/O from remote storage is a consistent bottleneck for large scale data processing workloads. When you have hundreds of thousands of files in S3 storage, even listing those files can take several minutes and become a bottleneck! Reading those files can be even more painful than the actual processing of the files.

Daft Dataframes are built for the cloud and feature many optimizations that make them extremely efficient at reading and working with cloud storage. In this talk, we will showcase and explain some of the optimization that are built into Daft using its Rust I/O layer, but exposed to users as a familiar Python Dataframe interface.


Prior Knowledge Expected

No previous knowledge expected

Jay is a cofounder of Eventual and a primary contributor to the Daft open-sourced project. Prior to Eventual, he was a software engineer building large scale ML data systems for computational biology at Freenome and self-driving cars at Lyft. He hails from the sunny island nation of Singapore, and used to command a platoon of tanks in the Singapore military.