PyData Global 2023

Abubakar Abid completed his PhD at Stanford in applied machine learning. During his PhD, he founded Gradio (www.gradio.dev), an open-source Python library that is used by more than 500,000 users every month to build learning demos. Gradio was acquired by Hugging Face, which is where Abubakar now serves as a machine learning team lead.

Abstract: In this talk, we will cover the practical tools for modern machine learning for machine learning datasets, models, and demos. First, we will start by talking about How to Use the Hugging Face Hub, covering how to easily find the right models and datasets for your machine learning tasks. Then, we will walk through Building and Sharing ML Demos: covering how to quickly demo ML models for class presentations, portfolios, etc using the Gradio (www.gradio.dev) library.

Building machine learning demos and web apps has traditionally required significant knowledge of web development (css, js) and web hosting. We will discuss the Gradio library (www.gradio.dev), an alternative that allows you to build machine learning demos entirely in Python. This tutorial will be hands-on: we'll be going through a colab notebook and end by hosting the demo on Hugging Face Spaces, so be ready to code!

Keynote - Building Machine Learning Apps in Python with Gradio

Alejandro Saucedo

Alejandro is Director of Engineering, Science, Product & Analytics at Zalando where he leads a cross-functional technology organisation consisting of department heads, managers, principals and ICs responsible for the development of a large portfolio of (10+) products, the management of one of Zalando's large-scale central data platforms, and the productionisation of SOtA machine learning systems powering high-value & critical use-cases across the organisation. Alejandro is also the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he contributes to policy and industry standards on the responsible design, development and operation of AI, and has led policy contributions including the EU's AI Regulatory Proposal, the Data Act, between others. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and tech giants, with a strong track record of building cross-functional R&D and Product organisations. He is currently appointed as governing council Member-at-Large at the Association for Computing Machinery (ACM), and is currently the Chairperson of the ML Security Committee at the Linux Foundation.

Linkedin: https://linkedin.com/in/axsaucedo
Twitter: https://twitter.com/axsaucedo
Github: https://github.com/axsaucedo
Website: https://ethical.institute/

The State of Production Machine Learning in 2023

Allen Downey

Allen Downey is a curriculum designer at Brilliant.org and professor emeritus at Olin College.
He is the author of several books -- including Think Python, Think Bayes, and Probably Overthinking It -- and a blog about data science and Bayesian statistics. He received a Ph.D. in computer science from the University of California, Berkeley; and Bachelor's and Masters degrees from MIT.

Extremes, outliers, and GOATS: on life in a lognormal world

Andrew Huang

Panel Sprint

Andrey Cheptsov

Andrey is the core contributor to dstack, an open-source framework for running LLM workloads across any clouds. Before dstack, Andrey worked at JetBrains as a PM for PyCharm and other dev tools.

Leveraging open-source LLMs for production

Anirban Ray

I have completed my master's in Statistics from Indian Statistical Institute with Computational Statistics specialisation back in 2019. Currently I am pursuing my career as a Data Scientist at Publicis Sapient.

sktime - python toolbox for time series: new features 2023 – advanced pipelines, probabilistic forecasting, parallelism support, composable classifiers and distances, reproducibility features

Arthur Andres

After graduating with an engineering degree in 2009, I’ve worked in all four corners of the City of London, for various financial institutions, big and small.
As a software engineer, I specialise in data intensive applications.
I've worked with both real-time systems, and batch jobs.
I have a keen interest in how we can get the two to interact seamlessly.

Unified batch and stream processing in python

Ashwin Srinath

Ashwin Srinath is a senior software engineer at NVIDIA, and part of the team developing RAPIDS. Prior to joining NVIDIA, he was a computational scientist at Clemson University, helping researchers develop and optimize HPC applications.

cudf.pandas: The Zero Code Change GPU Accelerator for Pandas

Avrahami

Abraham (Avrahami) Israeli is a research fellow at the Data Science Institute at Reichman University, focusing on research associated with NLP and social media. He is the lead of the social analytics vertical and serves as the project leader in the Arabic NLP projects.
In recent years Avrahami has taught machine learning courses in the computer science department at Reichman University.
Avrahami received his master's degree from Ben-Gurion University and his bachelor's degree from the Technion. Currently, Avrahami is a Ph.D. candidate in the Department of Software and Information Systems Engineering at Ben-Gurion University. His Ph.D. research deals with the behavior of communities and users over the web.
Before his academic path, Avrahami worked ten years in different machine-learning groups at IBM Research and Intel.

The Internet's Best Experiment Yet

Basel Alebdi

Basel Alebdi is the Lead Data Scientist at Jahez International, where he specializes in propelling business growth through data.

Data-Driven F&B Delivery: Jahez as a Leading Example

Benedikt Heidrich

I completed a Master of Science degree in informatics in 2019 with the Karlsruhe Institute of Technology. I am working towards a PhD in Informatics at the Karlsruhe Institute of Technology. My research focuses on using deep generative models in energy systems and coping with concept drift in energy time series forecasting. Additionally, I investigate how general pipeline architecture has to be designed for time series analysis tasks

sktime - python toolbox for time series: new features 2023 – advanced pipelines, probabilistic forecasting, parallelism support, composable classifiers and distances, reproducibility features

Bernice Waweru

Tricking Neural Networks : Explore Adversarial Attacks

Bobur Umurzokov

Bobur is a developer advocate and speaker specializing in software and data engineering. With over 10- years of experience in IT, he blogs about open-source technologies and the community around them. Nowdays he is contributing to Pathway's LLM App for the future of AI apps development .

Build AI-powered data pipeline without vector databases

Cainã Max Couto da Silva

Hi, my name is Cainã,
I'm the father of a human, a dog, and a cat.
I love traveling and gathering around with family and friends.

Professionally speaking, as a data scientist with a PhD in bioinformatics and over ten years of working on relevant projects, I developed a strong data science and analytics foundation. I have spent the last few years working at world‑renowned companies, developing end‑to‑end machine learning applications. Additionally, driven by my passion for knowledge, I've taught specialized courses in various data science topics. I am always eager to apply my expertise and create meaningful impacts.

Introduction to Machine Learning Pipelines: How to Prevent Data Leakage and Build Efficient Workflows

Cesar Garcia

César García Sáez. Computer Systems Engineer and MSc in Data Science at Universitat Oberta de Catalunya (UOC). His Master's Thesis focused on how to improve the data quality of open data datasets.

Speaker and researcher specialized on digital fabrication, Internet of Things, and open data. 14+ years working in IT as System Administrator at Madrid City Council IT department, before becoming an independent researcher. Passionate about the potential of open source technologies and open data to improve how do we relate with our cities, in a more profound, meaningful way.

Graduated at FabAcademy 2013 digital fabrication course. Co-founder of Makespace Madrid. He has been documenting and sharing stories about the maker movement for the last years at La Hora Maker, a podcast/youtube channel.

Improving Open Data Quality using Python

Chang She

Chang is the CEO / Co-founder of LanceDB and has been building data science / machine learning tooling for almost two decades. Previously he was VP of Eng at TubiTV where he focused on recommender systems, MLOps, and experimentation. A long long time ago, in a galaxy far far away, he was one of the original co-authors of pandas.

LanceDB: lightweight billion-scale vector search for multimodal AI

Chris Lo

Chris Lo is the co-founder of Tracecat (YC W24): an AI-native monitoring platform for cyber threat hunters and detection engineers.

https://tracecat.com

We rewrote tsfresh in Polars and why you should too

Chris Rackauckas

Dr. Rackauckas is a Research Affiliate and Co-PI of the Julia Lab at the Massachusetts Institute of Technology, VP of Modeling and Simulation at JuliaHub and Creator / Lead Developer of JuliaSim. He's also the Director of Scientific Research at Pumas-AI and Creator / Lead Developer of Pumas, and Lead Developer of the SciML Open Source Software Organization.

Dr. Rackauckas's research and software is focused on Scientific Machine Learning (SciML): the integration of domain models with artificial intelligence techniques like machine learning. By utilizing the structured scientific (differential equation) models together with the unstructured data-driven models of machine learning, our simulators can be accelerated, our science can better approximate the true systems, all while enjoying the robustness and explainability of mechanistic dynamical models.

NonlinearSolve.jl: how compiler smarts can help improve the performance of numerical methods

Christian Luhmann

PyMC / ArviZ / PyTensor Sprint
PyMC / ArviZ / PyTensor Sprint

Colleen Farrelly

Colleen Farrelly is a principal data scientist whose expertise spans network science, topological data analysis, quantum computing, and natural language processing. She and Dr. Gaba have a first book, The Shape of Data, which covers many network science tools.

Dr. Yae Gaba is a researcher at Quantum Leap Africa whose expertise includes computational geometry, topology, graph learning algorithms.

Dr. Franck Kalala Mutumbo is a researcher at African Institute of Mathematical Sciences and University of Lubumbashi. He is an expert in network science and has trained many African network science researchers. He and Ms. Farrelly have a network science book coming out in 2024.

Hands-On Network Science

Daniel Beutel

Daniel is one of the creators of Flower, the first fully agnostic federated learning framework, which is now being used at many Fortune 500 companies and most top universities worldwide. He previously held roles as Head of AI and CTO and has considerable experience in running and scaling engineering teams. Daniel is a CS PhD candidate at the University of Cambridge and has an MSc (with distinction) in Software Engineering from the University of Oxford.

Keynote - Federated Learning with Flower: AI's Next Frontier

David Nicholson

https://nicholdav.info/

VocalPy: a core Python package for acoustic communication research

Dean Pleban

With a background combining Machine Learning, Software Engineering, Physics, and design – Dean applies a multi-disciplinary to building products for machine learning and AI teams.

Dean is the CEO and co-founder of DagsHub, a platform for machine learning & AI teams that lets them build better models and manage their project's data, models, experiments, and code effectively—combining popular open-source tools and formats to create a central source of truth for AI projects.

Dean is also the host of the MLOps Podcast, where he speaks with industry experts about getting ML models to production.

Customizing and Evaluating LLMs, an Ops Perspective

Eddie

I am a data scientist with a background in applied math, and experience working in a variety of customer-facing and R&D roles. Over the last six years I have worked at startups and at Intel helping customers and open-source communities use machine learning software.

HPC in the cloud

Eitan Netzer

Chief Scientist at DataHeroes.

Real Time Machine Learning

Elijah ben Izzy

Elijah built large components of the simulation/trading infrastructure at Two Sigma, and led a team to test/ensure the reliability of their quantitative code. He then built out the ML platform at Stitch Fix that was used by 100+ data scientists (see https://multithreaded.stitchfix.com/blog/2022/07/14/deployment-for-free/). Most recently he co-authored the open source library Hamilton, a general-purpose lightweight framework for building dataflows in Python. Due to the success/possibilities presented by Hamilton, he left his job at Stitch Fix and started DAGWorks, with the goal of making it easy for Data Scientists to build and manage machine learning ETLs.

Bridging Classic ML Pipelines with the World of LLMs

Emeli Dral

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to evaluate, test, and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at GSOM SpBU and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students.

More like this: monitoring recommender systems in production

Eryk Lewinson

Eryk is an experienced data scientist who specializes in practical applications of data science methods. Outside of work, he has written over a hundred articles on topics related to data science, which have been viewed more than 4 million times. Additionally, he has authored two books on the application of Python in the financial context, both of which were published by Packt.

Version Control and Beyond: Leveraging Git for ML Experiment Management

Francesc Alted

I am a curious person who studied Physics and Math when I was young. Through the years, I developed a passion for handling large datasets and using compression to enable their analysis using regular hardware that is accessible to everyone.

I am leading the Blosc Development Team, and currently interested in determining, ahead of time, which combinations of codecs and filters can provide a personalized compression experience. This way, users can choose whether they prefer a higher compression ratio, faster compression speed, or a balance between both.

Last, but not least, I have recently been awarded with the "2023 Project Sustainability Award" from NumFOCUS.

You can know more on what I am working on by reading my latest blogs.

Btune: Making Compression Better
Blosc2: Fast And Flexible Handling Of N-Dimensional and Sparse Datasets

Franck Kalala Mutombo

Franck Kalala Mutombo is a Professor of Mathematics at Lubumbashi University and former Academic Director of AIMS-Senegal. He previously worked in a research position at Strathclyde University and at AIMS-South Africa in a joint appointment with the University of Cape Town. He holds a Ph.D. in Mathematical Sciences from the University of Strathclyde, Glasgow, Scotland. He is an expert in the study and analysis of complex networks structure and applications. The most recent study considers the impact of network structure on long-range interactions applied to epidemics, diffusion, and object clustering. His research interest includes Differential Geometry of Manifolds, Finite Element Methods for PDEs, Networks, and Data Science.

Hands-On Network Science

Franz Kiraly

core developer and founder of sktime, a python open source library for ML with time series, and an openly governed community with charitable mission

sktime - python toolbox for time series: new features 2023 – advanced pipelines, probabilistic forecasting, parallelism support, composable classifiers and distances, reproducibility features

Franz Kiraly

founder and core developer of sktime

sktime – the saga. Trials and tribulations of a charitable, openly governed open source project

Gajendra Deshpande

I am Gajendra Deshpande and I am using Python since 2013 for academic research and development activities. I develop prototypes and applications in Natural Language Processing, Machine Learning, Cyber Security, and Web applications using Python and its ecosystem. I am working as a faculty of Computer Science and run a start-up in cyber security. I am an active member of the PyCon India community and served as program committee lead for PyCon India 2021. I have presented approximately 80 talks, 20 Workshops, and 15 posters across the globe at prestigious conferences like PyData Global, PyCon APAC, PyCon AU, EuroPython, DjangoCon US and Europe, SciPy India, SciPy USA, PyCon USA, JuliaCon, FOSDEM, and several other Python and FOSS conferences. I have helped Python and FOSS Conferences by reviewing the talk and tutorial proposals, mentoring first-time speakers, participating in the discussions, and organizing the events.

Fighting Money Laundering with Python and Open Source Software

Gatha Varma

Hello there! I am currently working as a Senior Data Scientist at Censius Inc.

My typical day at work involves:
✦ Research, prototyping and discussions on product features
✦ Product roadmap documentation
✦ Review media content and resources
✦ Pre-sales pitches

I will be defending my Ph.D. thesis soon 🤞🏼
Specialising in data privacy, my Ph.D. work dabbled with differential privacy and synthetic data.

I'm a mediocre runner who's a mom to two rescued dogs and one non-rescued human.

Production Data to the Model: “Are You Getting My Drift?”

Gilberto Hernandez

Gilberto is a Developer Advocate at Snowflake. Prior to Snowflake, he built engaging developer education experiences at MongoDB, Codecademy, Plaid, and Domino Data Lab. He's been educating developers for about a decade, and loves all things developer experience and education. He'd love to connect with you on LinkedIn: https://www.linkedin.com/in/gilberto-hernandez/

Build and deploy a Snowflake Native Application using Python

Giles Weaver

Data scientist. Domain expertise in maritime shipping (AIS). User of PySpark & Dask for over five years. Formerly a bioinformatician. Available for contract work.

Pandas 2, Dask or Polars? Quickly tackling larger data on a single machine

Giuditta Parolini

The Hell, According to a Data Scientist

Gordon Shotwell

Gordon is a Software Engineer at Posit PBC where he works on Shiny for Python. He was previously a Data Scientist and Product Manager at Socure where he built fraud models and developed data science tools.

Understanding reactive execution in Shiny

Guillaume Lemaitre

I am currently an engineer working on the maintenance of scikit-learn, a machine learning package in Python at the scikit-learn consortium from the Inria Fondation.

Get the best from your scikit-learn classifier: trusted probabilties and optimal binary decision

Guodong Jin

Postdoc at University of Waterloo. Working on the open source project Kùzu, which is a highly scalable, extremely fast, and very easy-to-use embeddable graph database.

Kùzu: A Graph Database Management System for Python Graph Data Science

Ian Ozsvald

Ian is a Chief Data Scientist, has helped co-organise the annual PyDataLondon conference raising $100k+ annually for the open source movement along with the associated 12,000+ member monthly meetup. Using data science he's helped clients find $2M in recoverable fraud, created the core IP which opened funding rounds for automated recruitment start-ups and diagnosed how major media companies can better supply recommendations to viewers. He gives conference talks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python (2nd edition). He has over 25 years of experience as a senior data science leader, trainer and team coach. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee. Past talks and articles can be found at:

https://ianozsvald.com/
https://notanumber.email/
https://github.com/ianozsvald/
https://twitter.com/ianozsvald
https://fosstodon.org/@ianozsvald
https://www.linkedin.com/in/ianozsvald/

Pandas 2, Dask or Polars? Quickly tackling larger data on a single machine

Itamar Turner-Trauring

Itamar is the creator of Sciagraph, a performance and memory profiler for Python data science processing. He is working on a book for data scientists and scientists who use Python about how to speed up low-level code. He writes about Python performance, Docker packaging, and more at https://pythonspeed.com.

Optimize first, parallelize second: a better path to faster data processing

Jacob Tomlinson

Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK.

Accelerating fuzzy document deduplication to improve LLM training with RAPIDS and Dask

Jaime Rodríguez-Guerra

Ensuring Runtime Reproducibility in the Python Ecosystem

James Bourbeau

I'm a software engineer and open source community member. Most of my coding activities center around the Python data science stack. In particular, I'm a core maintainer of Dask and I work at Coiled where I focus on scaling Python. Previously I studied at the University of Wisconsin-Madison where I received my PhD in Physics.

Cloud UX for Data People

Jay Chia

Jay is a cofounder of Eventual and a primary contributor to the Daft open-sourced project. Prior to Eventual, he was a software engineer building large scale ML data systems for computational biology at Freenome and self-driving cars at Lyft. He hails from the sunny island nation of Singapore, and used to command a platoon of tanks in the Singapore military.

Blazing fast I/O of data in the cloud with Daft Dataframes

Jean Carlo Machado

Jean Carlo Machado is a Brazilian DataScience Manager at GetYourGuide for the Growth Data Products team and the Machine Learning Platform Team. From this point of view is able to collaborate with amazing people in turning business opportunities into data science products, from inception to large scale production deployments of multiple data products. Jean values community building and getting communities together; he is currently one of the organizers of the MLOps.community Berlin. Jean spends a significant part of his ever shrinking free time building open-source tools his focus right now building social good tech.

DDataflow: An open-source end to end testing from machine learning pipelines

Jerry Liu

Jerry is the co-founder/CEO of LlamaIndex, a data framework for building LLM applications. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, worked on recommendation systems at Quora, and graduated from Princeton.

Keynote - Building and Productionizing RAG

Jim Dowling

High speed data from the Lakehouse to DataFrames with Apache Arrow

Johan Herland

Johan is a Developer Productivity Engineer at Tweag. Johan has almost twenty years of industry experience, mostly working with Python, Linux and open source software. He has a passion for designing and implementing elegant and useful solutions.

FawltyDeps: Finding undeclared and unused dependencies in your notebooks and projects

Jon Wang

Jon Wang possesses a deep understanding of large model inference systems, relevant ecosystems like LangChain, and their practical applications. With over 4 years of experience in distributed system design and development, Wang has a proven track record in the creation, development, testing, and delivery of products from scratch. He is well-acquainted with the open-source ecosystem and has been an active contributor to Apache IoTDB, making significant contributions in terms of key features and bug fixes. Reliable and adept at communication, Wang is a team player with a strong passion for technology.

Xorbits Inference: Model Serving Made Easy

Jonathan Starr

Program Manager, NumFOCUS

Map of Open-Source Science (MOSS)

Jorn Mossel

Jorn Mossel works as a Data Scientist in Energy demand forecasting. Prior to that, he worked for a decade on Wall Street as a quant, building systematic trading strategies and risk models. Jorn holds a Ph.D. in Theoretical Physics.

Modeling Extreme Events with PyMC

Juan De Dios Santos

Juan is a Trust & Safety Software Engineer working at Bitly. His role at the company is to develop solutions to ensure the quality of the links created or extended by the users. Besides fighting spam, fraud, and scam, Juan's an avid writer. In 2021, he published Practical TensorFlow.js and has written countless articles covering topics such as machine learning, quantified self, and quirky ones such as data analysis done using Pokémon data. Juan holds a BSc in Computer Science from the University of Puerto Rico - Rio Piedras Campus and an MSc in Computer Science from Uppsala University in Sweden.

Getting better at Pokémon using data, Python, and ChatGPT.

Juan Luis Cano Rodríguez

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He has a decade of experience as developer advocate, software engineer, and Python trainer in several industries, and currently he works as Principal Product Manager for Kedro, an open source Python framework for data science, at QuantumBlack, AI by McKinsey.

He has made significant contributions to the PyData stack and published several open-source packages, the most important one being poliastro, an open-source Python library for orbital mechanics used at space agencies, satellite companies, and universities.

After founding the Python España non-profit and co-organizing the first seven PyCons in Spain, he became a Python Software Foundation Fellow in 2017. Nowadays he is the lead organizer of the PyData Madrid monthly meetups.

Who needs ChatGPT? Rock solid AI pipelines with Hugging Face and Kedro

Jérémy Ravenel

Naas Sprint

Kalyan Prasad

A self-taught data scientist/analytics manager, open-source enthusiast, speaker & community first-person. Kalyan has presented talks at prestigious conferences and Educational Institutions such as PyData Global, Data Observability Conference , Data Science Global Summit 2022, JupyterCon, PyCon India, Devfest Hyderabad, PyCon APAC, PyCon Hong Kong, PyCon JP, PyCon ZA, Pyjamas, Conf42, Developer Conference Telangana 2021, BelPy & KLS Gogte Institute of Technology, Belagavi, Karnataka, India.
I also worked as Reviewer and Mentor for reputed conferences & hackathons including PyData, PyData Seattle, SciPy, EuroPython, JupyterCon, PyCon US, PyCon India, PyConfHyderabad, and many others.
Kalyan is also contributing to various open-source communities. He enjoys being involved with these communities and helping them grow. Currently I am associated with the following organizations below:
NUMFOCUS - Small Development Grants Review Committee
PyCon India: Co-Chair
PyConf Hyderabad: Co-Chair
Mentor- KaggleX BIPOC Mentorship Program{Cohort3}
PyData Global Impact Mentoring Program: Mentor
Hyderabad Python Users Group: Core Member/Co-Organizer
Humans for AI: Program Manager for AI Learning Community

Python-Driven Portfolios: Bridging Theory and Practice for Efficient Investments

Kim Pevey

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

Kyle Sunden

Matplotlib Sprint

Lu Qiu

Lu Qiu is a machine learning engineer at Alluxio and is a PMC maintainer of the open source project Alluxio. Lu develops big data solutions for AI/ML training. Before that, Lu was responsible for core Alluxio components including leader election, journal management, and metrics management. Lu receives an M.S. degree from George Washington University in Data Science.

Maximize GPU Utilization for Model Training

Luca Baggi

ML Engineer interested in forecasting

How I used Polars to build built functime, a next gen ML forecasting library

Lucas Durand

Lucas Durand (he/him/his) is the Director of Data Science Engineering at TD Securities and the Product Owner for TDS Notebooks, the TD Securities "Data Platform as a Service". Lucas has been with TD for upwards of 7 years as a Quant, Software Engineer, and Data Scientist.

Lucas holds a Master of Science in Theoretical Physics from York University as well as an Honours Bachelor of Science from the University of Toronto. He is a passionate teacher, avid musician, and big advocate for Python as a first-class language in banking.

Building an Interactive Network Graph to Understand Communities

Maarten

Maarten Breddels is an entrepreneur and ex-scientist mainly working with Python, C++, and Javascript in the Jupyter ecosystem. He is the creator of Solara, ipyvolume, and Vaex and Co-founder of Widgetti. His expertise includes fast numerical computation, API design, 3D visualization, and building data apps. He has a Bachelor's in ICT, a Master's, and Ph.D. in Astronomy, and he likes to solve real problems.

Solara simplifies building complex dashboards.

Malte Tichy

After pursuing his PhD and postdoc research in theoretical quantum physics, Malte joined Blue Yonder as a Data Scientist in 2015. Since then, he has led numerous external and internal projects, which all involved programming python, creating, working with and evaluating probabilistic predictions, and communicating the achieved results.

Paradoxes in model training and evaluation under constraints

Marco Gorelli

Marco Gorelli is a Senior Software Engineer at Quansight Labs, primarily working on DataFrame APIs. He is also a volunteer maintainer of Polars and paid maintainer of pandas.

Polars and time zones: everything you need to know

Martin Durant

Staff Software Engineer at Anaconda, Inc. Creator of fastparquet, fsspec, intake and kerchunk.

Intake 2

Martin Y. Xie

Martin did PhD in Computer Science (inter-discipline with Economics) at Oxford University, and studied in Singapore and MIT for his undergraduate and master degrees. Martin published paper on parallel and distributed computing, supercomputing, Grid computing etc, won best paper award, and served as reviewer and session chairs of internationl conferences in related area.

After PhD, Martin worked as a quant in quantitative hedge fund in London for about 3 year, and started a FinTech startup in China focusing on quant and blockchain-related technology/trading as well (including high-frequency trading). Martin moved to Dubai UAE at the begining of Covid pandemic, and co-founded a technology-based general trading company focusing on live-streaming. Martin is a passionate and true believer of decentralized collaboration and partnership using technology.

Introduction to Using Julia for Decentralization by a Quant

María Cruz

Data Tales from an Open Source Research Team

Mathieu Cayssol

Software engineer

We rewrote tsfresh in Polars and why you should too

Matt Harrison

Matt Harrison spends most of his time teaching Python and Data Science. He has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books Effective XGBoost, Effective Pandas, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), and small companies. He has been using Python since 2000 and has taught thousands through live training, both online and in person.

An Introduction to Pandas 2, Polars, and DuckDB

Matthew Rocklin

Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.

Matthew holds a bachelors degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago.

Website: https://matthewrocklin.com
Dask: https://dask.org/
Coiled: https://coiled.io

Arrow revolution in pandas and Dask

Megan Lieu

Hi, I am Megan Lieu. I work at Deepnote.

Collaborate with your team using data science notebooks

Mine Cetinkaya-Rundel

Mine Çetinkaya-Rundel is Professor of the Practice at Duke University and Developer Educator at Posit. Mine's work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing the retention of women and underrepresented minorities in STEM. Mine works on the OpenIntro project, whose mission is to make educational products that are free, transparent, and lower barriers to education. As part of this project, she co-authored four open-source introductory statistics textbooks. She is also the creator and maintainer of datasciencebox.org, co-author on R for Data Science (2nd Edition), and she teaches the popular Statistics with R MOOC on Coursera.

Dashing through the snow (or Sharing your data), in a Quarto Dashboard

Mustafa Zengin

Data Scientist at Walmart

Using Large Language Models to improve your Search Engine

Nabanita Roy

Building Contextual ChatBot using LLMs, Vector Databases and Python

Ngesa Marvin

Ngesa is an Electrical Engineer specializing in Signal Processing and Computer Vision. He started his ML journey as an Intel AI Innovator and later joined Liquid Intelligent Technologies as an IoT Solutions Engineer. He is currently a Device Manager, Safaricom PLC focusing on Cloud & AIoT use cases. He is also an Arm AI Ambassador.

When not building products, he loves to teach and share knowledge with others. He believes ML education should be accessible to all. Recently, he has been working on sensor data analysis from IoT devices to help manage Device lifecycles and drive the right customer experience.

He founded Nairobi AI and leads Machine Learning efforts in communities such as GDG Nairobi and Tiny ML Kenya.

Keras (3) for the Curious and Creative

Nidhin Pattaniyil

Machine Learning Engineer working on Search

Using Large Language Models to improve your Search Engine

Nir Barazida

Nir Barazida, MLOps Team Lead at DagsHub.

Always pushing the envelope and exploring the frontiers of technology. Nir combines a unique background of computer vision engineering, MLOps research, and public speaking - to give a fascinating session on topics he lives and breathes.

Nir is the MLOps Team Lead at DagsHub. He focuses his research on improving workflows for data science teams that work in a production-oriented environment.

Nir graduated with honors from BGU University, majored in Structural Analysis and Finite Element Simulations, and is currently pursuing his Master's in Data Science from Reichman University.

Unlock the Full Potential of Jupyter Notebooks

Nouf Alroqi

Data Scientist at Jahez International Company

Data-Driven F&B Delivery: Jahez as a Leading Example

Olivier Grisel

Machine Learning software engineer at Inria and member of the maintainers' team of the scikit-learn open source project.

Predictive survival analysis with scikit-learn, scikit-survival and lifelines

Oren Netzer

Oren is Co-Founder and CEO of DataHeroes. Prior to starting DataHeroes, Oren was Co-Founder and CEO of cClearly and Co-Founder and CEO of DoubleVerify (NYSE: DV). Oren was named to the a Silicon Alley 100 list and is a winner of the Technology Pioneers Award from the World Economic Forum in Davos.

Real Time Machine Learning

Pascal Bourgault

Physical oceanography grad that became a climate science specialist and scientific developer.

Xclim: Climate Data Processing and Analysis for Everyone

Patrick Deziel

Patrick Deziel is a distributed systems engineer and machine learning specialist. Patrick has extensive experience building and maintaining mission-critical systems in the private sector, as well as integrating modern ML solutions into existing applications. At Rotational, he designs and builds intelligent distributed systems to enable global use cases. In his free time, Patrick enjoys rock climbing and consuming science fiction.

Event-Driven Data Science: Reconceptualizing Machine Learning for the Real-time World

Patrick Hoefler

Patrick Hoefler is a member of the pandas core team and a Dask maintainer. He is currently working at Coiled where he focuses on Dask development and the integration of a logical query planning layer into Dask. He holds a Msc degree in Mathematics and works towards a Msc in Software engineering at the University of Oxford.

Arrow revolution in pandas and Dask

Pavithra Eswaramoorthy

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets
Ensuring Runtime Reproducibility in the Python Ecosystem

Peter Vidos

Peter, the CEO, and Co-Founder of Vizzu, is on a mission to redefine how we perceive and interact with data. His passion lies in uncovering innovative solutions to the challenges faced by data professionals when it comes to chart creation and presentation.

With over 15 years of experience in digital product development, Peter's career has spanned a wide array of projects, from mobile app testing to online analytics, decision support systems, and e-learning solutions.

In his current role at Vizzu, Peter is dedicated to driving innovation in data visualization and empowering data professionals to effortlessly convey their insights through interactive and animated data stories.

Building Interactive, Animated Reports and Dashboards in Streamlit with ipyvizzu
Empowering Data Exploration: Creating Interactive, Animated Reports in Streamlit with ipyvizzu

Philip Meier

Philip is a Senior Software Engineer at Quansight.

From RAGs to riches: Build an AI document interrogation app in 30 mins

Philipp Rudiger

Panel Sprint

Prema Roman

Prema Roman is a distributed systems engineer at Rotational Labs. She is an experienced software, data, and machine learning engineer with a proven track record of building high quality software applications and data products. Her passion for continuous learning has taken her a long way from her start as a data analyst, as she takes on new challenges at Rotational Labs building globally distributed systems and machine learning data products.

Event-Driven Data Science: Reconceptualizing Machine Learning for the Real-time World

Quan Nguyen

Hi, I'm Quan Nguyen

But what is a Gaussian process? Regression while knowing how certain you are

Ramon Perez

Ramon is currently a developer advocate at Seldon. Before joining Seldon, he worked as an independent freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Going a bit further back, Ramon used to wear different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.

Architecting Data Tools: A Roadmap for Turning Theory and Data Projects into Python Packages

Ramona Sartipi

Ramona is a self-taught designer at IBM Watsonx, who has a passion for creating experiences that are intuitive and impactful.
She believes that highly innovative and inclusive products begin with diverse and collaborative teams. As an advocate of Design Thinking, she actively creates spaces to bring people together to ship human-centered and creative solutions. With a background in Computer Science, Ramona is passionate in creating opportunities for the underrepresented in tech. She co-founded Ellehacks, a women-led, women-focused yearly hackathon that help over 1000+ beginners experiment with technology. Ellehacks is amongst the largest all-female hackathons in Canada.

When Design Thinking Meets Opensource

Ravi

AI engineer.

Using Large Language Models to improve your Search Engine

Ravi Singh

Data-driven leader with over 13 years of experience in consulting and in-house roles across diverse industries like wealth management, mortgage lending, telecom, and streaming services. Adept at using a product-focused approach to deliver business value. Currently leading data and Analytics at Schibsted, I am passionate about leveraging data to drive informed decisions and business growth. Open to opportunities where I can bring my expertise to create impactful data strategies.

Unravelling Hidden Technical Debt in ML: A Pythonic Approach to Robust Systems

Ruan Pretorius

🖥 I am a data scientist
🧠 I love machine learning and AI
☕ I turn coffee into AI
🌱 I’m currently playing with GenAI and LangChain
Check out my GitHub page for more

How to build a data pipeline without data: Synthetic data generation and testing with Python

Russell Keith-Magee

Dr Russell Keith-Magee is the founder of the BeeWare project, a project developing GUI tools and libraries to support the development of Python software on desktop and mobile platforms. He joined the Django core team in 2006, and for 5 years, was President of the Django Software Foundation. He is a frequent speaker at Python and Django conferences around the globe, sharing his experience as a FLOSS developer, community maintainer, and (unsuccessful) startup founder. In his day job, he is a Principal Engineer at Anaconda, working on BeeWare in the OSS team.

Build a Data Visualization App For Your Phone

Ryan O'Neil

Ryan is CTO and Co-founder of Nextmv (nextmv.io). Before that, he was an optimization wizard and led the Decision Engineering departments at Zoomer and Grubhub. He studied Operations Research at George Mason University and has a cat.

Order up! How do I deliver it? Build on-demand logistics apps with Python, OR-Tools, and DecisionOps

Sankalp Gilda

Sankalp Gilda is an MLE by trade and an astronomer by training, obsessed with uncertainty quantification, and causality, and open source software. When not building production ML pipelines or working on statistical modeling research, you'll find him jumping from planes, boats, or cliffs -- seeking the thrill of the outdoors.

IID Got You Down? Resample Time Series Like A Pro

Sara Iris Garcia

Sara is a seasoned software developer and a data science enthusiast. She holds a master's degree in Data Science and her main research interest is the application of artificial intelligence in health care. When she is not analyzing data, she spends her free time learning how to grow vegetables and becoming self sustainable.

API development for data analysts/scientists with FastApi

Saradindu Sengupta

I am working at Nunam, an energy analytics startup based in Bangalore, India, where my primary area of work is building health and lifecycle forecasting of Li-ion batteries in EV and energy storage. I have over 4 years of professional experience in building ML systems from the ground up after finishing my master's from IIITM, Kerala. I have spoken at both physical and virtual conferences where my primary area of focus has been on Computer Vision, MLOps, model interpretability and model compression and quantization.

Previous Talks

"Managing data quality issues in ML production, especially for time-series" - Link Slides at Google Developer Group Community Day, 2022
"Things I learned while running neural networks on microcontroller" - Link Slides at PyData Global 2022
"Bessel's Correction: Effects of (n-1) as the denominator in Standard deviation" - Link Slides at PyData Global 2022
"Interpretable ML in production" - [Slides](https://docs.google.com/presentation

How can a learnt ML model unlearn something: Framework for "Machine Unlearning"

Sean Sheng

Sean currently serves as the Head of Engineering at BentoML. He has led the team to successfully release multiple open-source projects, including BentoML and OpenLLM, aimed to help facilitate AI application development. Additionally, Sean has also led the launch of the AI deployment platform BentoCloud, designed for deploying and scaling AI applications in production. Prior to his role at BentoML, he led engineering teams at LinkedIn, where he supported the service infrastructure powering all of LinkedIn's backend services.

Productionizing Open Source LLMs

Shagun Sodhani

I am a tech lead at Meta. My research is focusing on developing foundation models for multimodal data. Outside of tech, I am interested in economics and finance.

Training large scale models using PyTorch

Shashank Shekhar

Shashank is Data Sciences leader with diverse experience across verticals including Telecom, CPG, Retail, Hitech and E-commerce domains. He is the founder of Gen AI focused startup AIOrdinate and is building a state of the art industry first LLM Ops platform which will enable secured deployment of LLM applications and products at a fraction of cost of individual LLMs with very low latency and high accuracy. In the past, he has worked in VMware, Amazon, Flipkart, Subex and Target and has been involved in solving various complex business problems using Machine Learning and Deep Learning. He has been part of the program committee of several international conferences like ICDM and MLDM and was selected as a mentor in Global Datathon 2018 organized by Data Sciences Society. He has multiple patents and publications in the field of artificial intelligence, machine learning, deep learning and image recognition in several international journals of repute to his credit. He has spoken at many summits and conferences like PyData Global, APAC Data Innovation Summit, Big Data Lake Summit, PlugIn etc. He has also published three open-source libraries on Python and is an active contributor to the global Python community.

LLMs: Beyond the Hype - A Practical Journey to Scale

Shaurya Agarwal

Shaurya Agarwal, Deputy Head - Engineering, at Barnes and Noble (BNED LoudCloud).

With 20+ years of experience in Analytics & ML, Big Data and Cloud Computing, Shaurya is leading the engineering teams at BNED that are working on building the next generation of data products for the company.

All Them Data Engines: Pandas, Spark, Dask, Polars and more - Data Munging with Python circa 2023

Shivay Lamba

Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow. He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.

Enhancing your JupyterLab Developer Experience with Local LLMs and Code Snippets

Simon Hansen

Panel Sprint

Soham Butala

I'm Soham, a Data Science Master's student at the University of Washington. With three years of diverse experience at Deloitte, I've delved into software engineering, data engineering, and application security. I'm deeply passionate about Data Engineering and always eager to embrace new technologies. Beyond the screen and code, I find solace in the great outdoors; hiking is not just an activity for me but a way to rejuvenate my spirit. And when it comes to mental exercises, who can resist the allure of a thrilling game of chess? Looking forward to connecting and exploring the vast horizons of technology and beyond.

Prefect Workflows for Scaling Acoustic Fisheries Survey Pipelines

Sophia Vargas

Sophia Vargas is a Researcher and Program Manager in the contributor experience within Google’s Open Source Programs Office. In this role she research leads efforts that span project health, contributor experience, and open source economics. She is also on the Governing Board and an active contributor to the CHAOSS community. Prior to Google, Sophia was an analyst at Forrester Research, covering data center infrastructure and cloud strategy.

Data Tales from an Open Source Research Team

Stefan Krawczyk

A hands-on leader and Silicon Valley veteran, Stefan has spent over 15 years thinking about data and machine learning systems, building product applications and infrastructure at places like Stanford, Honda Research, LinkedIn, Nextdoor, Idibon, and Stitch Fix. A regular conference speaker, Stefan has guest lectured at Stanford’s Machine Learning Systems Design course and is an author of a popular open source framework called Hamilton. Stefan is currently CEO of DAGWorks, an open source startup that is enabling teams a standardized way to build and maintain data, ML and LLM pipelines without the coding nightmares.

Bridging Classic ML Pipelines with the World of LLMs

Stephen Macke

I'm an engineer at Databricks where I work on tools and infrastructure for machine learning and data science. I'm passionate about pushing the limits of Python for data science use cases, and would love to chat with other tool developers to learn about the exciting developments in this area. In my free time, besides maintaining a few open source projects, I enjoy spending time with my wife and our cat in our vegetable garden.

Python as a Hackable Language for Interactive Data Science

Sujit Pal

Sujit Pal is an applied data scientist at Elsevier Health, where he spends his time applying ML and NLP techniques to improve the quality of search results in various clinical applications. His areas of interests include Semantic Search, Natural Language Processing, Machine Learning and Deep Learning.

Building Learning to Rank models for search using Large Language Models

Theodore Meynard

Theodore Meynard is a data science manager at GetYourGuide. Data Science Manager at GetYourGuide, leads the evolution of their ranking algorithm, helping customers to find the best activities to book and locations to explore. Beyond work, he is one of the co-organizers of the Pydata Berlin meetup and the conference.
When he is not programming, he loves riding his bike looking for the best bakery-patisserie in town.

DDataflow: An open-source end to end testing from machine learning pipelines

Tim Bonnemann

Community Lead for Open-Source Science (OSSci) at IBM Research.

Map of Open-Source Science (MOSS)

Trevor James Smith

Trevor James Smith is a climate change research software developer in Montreal, Quebec. He holds a B.Sc. Hons. in Environnmental Sciences and Political Science and an M.Sc. in Geography, Planning and Environmental Studies from Concordia University. Trevor has worked in various fields as a contract researcher examing Canadian telecommunication infrastructure, indigenous-led GIS/mapping initiatives, and environmental disaster hazard and risk mapping. For the past 5 years, he has been a member of the climate platforms, data, and operations team at Ouranos Inc.

Xclim: Climate Data Processing and Analysis for Everyone

Ville Tuulos

Ville has been developing infrastructure for machine learning and AI for over two decades. He has worked as an ML researcher in academia and as a leader at a number of companies, including Netflix where he led the ML infrastructure team that created Metaflow, a popular open-source framework for ML infrastructure. He is the co-founder and CEO of Outerbounds, a company developing modern human-centric ML. He is also the author of a book, Effective Data Science Infrastructure, published by Manning.

Compute anything with Metaflow

Vino Duraisamy

Vino is a Developer Advocate, focussing on Data engineering and LLM workloads at Snowflake. She started as a software engineer at NetApp, and worked on data management applications for NetApp data centers when on-prem data centers were still a cool thing. She then hopped onto cloud and big data world and landed at the data teams of Nike and Apple. There she worked mainly on batch processing workloads as a data engineer, built custom NLP models as an ML engineer and even touched upon MLOps a bit for model deployments. When she is not working with data, you can find her doing yoga or strolling the golden gate park and ocean beach.

From raw data to interactive data app in an hour: Powered by Python

William Dealtry

William Dealtry has been working in both Python and C++ for many years, and has been a member of the C++ standardization committee for more than a decade. Having previously worked with financial data a places like the New York Stock Exchange and Goldman Sachs, he is currently the Architect of a new open-source Dataframe database, ArcticDB, which is backed by long-time Python enthusiasts Man Group and Bloomberg.

Data persistence with consistency and performance in a truly serverless system

Yae U. Gaba

As a mathematician, I have a keen interest in both theoretical concepts and practical applications, particularly in modeling real-world problems that are relevant to education and industry.

Hands-On Network Science

Yuliia Barabash

Yuliia Barabash
I have lived in Germany for the past five years, during which I have gained a diverse range of experiences in the tech industry. My expertise spans from developing web applications in Python to constructing AWS cloud solutions. I have a good understanding of design patterns, Object-Oriented Programming (OOP), event-driven architecture, and microservices architectures. Additionally, I have hands-on experience with REST API design and database technologies. I am continuously committed to enhancing my skills and ensuring that I utilize tools in the best practices.

Data Harvest: Unlocking Insights with Python Web Scraping

Zachary Blackwood

Once a teacher, then a web developer, now a Data Something. Father of 5 beautiful daughters. Working at Snowflake to make Streamlit even more amazing than it already is.

Empowering Data Exploration: Creating Interactive, Animated Reports in Streamlit with ipyvizzu

Zander Matheson

Zander is the CEO and Founder of Bytewax. Before starting Bytewax he worked as a data scientist and machine learning engineer at Heroku and GitHub. He is passionate about building data tools and before starting Bytewax, built data tools internally. Outside of work, Zander enjoys all the outdoor activities that his home in Santa Cruz has to offer - namely surfing, biking and hiking.

Real-Time Revolution: Kickstarting Your Journey in Streaming Data

amanda casari

amanda casari is a developer relations engineer in the Open Source Programs Office at Google, where she is co-leading research and engineering to better understand risk and resilience in open source ecosystems. She was named an External Faculty member of the Vermont Complex Systems Center in 2021. amanda is persistently fascinated by the difference between the systems we aim to create and the ones that emerge, and pie.

Data Tales from an Open Source Research Team

hugo bowne-anderson

Hugo Bowne-Anderson is Head of Developer Relations at Outerbounds. He is also the host of the industry podcast Vanishing Gradients. Hugo is a data scientist, educator, evangelist, content marketer, and data strategy consultant, with extensive experience at Coiled, a company that makes it simple for organizations to scale their data science seamlessly, and DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry.

Orchestrating Generative AI Workflows to Deliver Business Value
Full-stack Machine Learning and Generative AI for Data Scientists

sktime community

three members of the community will co-present

sktime - python toolbox for time series: new features 2023 – advanced pipelines, probabilistic forecasting, parallelism support, composable classifiers and distances, reproducibility features

sonam

Sonam is working as the developer advocate for qdrant engine, previously for Rasa. She has previously worked as an AI researcher at Saama Technologies on various AI projects for Pfizer and NIH For drug trials and Inclusion in medicine. She is passionate about Language models and has made various videos explaining generative AI for Rasa developers. She is a published author in ACL and loves to give talks at conferences, meet developers.

Let chatGPT decide and run the function!