PyData Global 2023

The Hell, According to a Data Scientist
12-06, 18:00–18:30 (UTC), General Track

The talk takes inspiration from a famous literary piece, Dante Alighieri's "Inferno" (in Italian, the Hell) to offer data scientists a moral revenge on the data sinners they constantly encounter in their professional life. While Dante populates his Hell with political enemies and even former Popes, I redraw the map of Dante's Inferno finding a place and an adequate punishment for data sinners. With the help of the audience, I will make sure that creators of invalid CSV files, users of identifiers so unique that they are even longer than the recommended PEP 8 line length, and all other data sinners find their well-deserved place in Hell. The bottom line of the talk is that data scientists' life will not improve until organisations begin to manage their data properly and realise that data products and infrastructures can be developed only when data satisfy minimal usability criteria, such as machine-readability.


'Inferno' is the Italian word for 'Hell', but it is also the title of the first volume in Dante Alighieri's "Divine Comedy", a masterpiece of early Italian poetry famous worldwide for its emblematic characters and vivid narration. Dante's Inferno is especially popular for its powerful display of sinners and their harsh punishments.

In my talk, I will trace a new map of the nine rings in Dante's Inferno and populate Hell with the people who make data scientists' life a misery. In agreement with Dante's Contrappasso Law, I will also choose a punishment that either resembles or contrast the data sin committed. I will come to the talk with a map of Hell that is the result of my frustrating experiences, but I wish to engage the audience in conversation and transform the talk in a collective effort against data sins.

The talk does not require any previous knowledge, although a certain baggage of exasperation can definitely help to contribute to the discussion. It is aimed at anyone who was ever asked to make an API by people who never heard about machine-readability and were still managing their data using spreadsheets or anyone who spent hours just to search for and get rid of a million different placeholders used for missing values in a database.

The tone of the conversation will be light, but the message behind the discussion is serious and urgent. While business executives and organisation leaders are fascinated by buzzwords like generative AI and Deep Learning, they still overlook the ABC of working with data, which is the prerequisite for every serious data product and infrastructure.


Prior Knowledge Expected

No previous knowledge expected