PyData Global 2023

How can a learnt ML model unlearn something: Framework for "Machine Unlearning"
12-07, 18:00–18:30 (UTC), Machine Learning Track

In the recent past with the explosion of large language or vision models, it became inherently very costly to train models on new data. Coupled with that the various new data privacy legislations introduced or to be introduced make the "right to be forgotten" very costly and time-consuming. In this talk, we will go through the current state of research on "machine unlearning", how a learnt model forgets something without retraining and a general demonstration of the machine unlearning framework.

While it is becoming easy day by day to access any large language models (open-source or proprietary), with new data privacy regulations being introduced, such as the EU’s General Data Protection Regulation Mantelero, 2013 or Canada’s Personal Information Protection and Electronic Documents Act, which stipulate that individuals have the “right to be forgotten”, It is becoming quite difficult to request the "right to be forgotten" as data leaks are prevalent among all the major publicly available large models (language, vision or medical). Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. To that end, the "right to be forgotten" is not just simply deleting the user data from the datastore but also removing any influence of that data on the models' weights.
This introduces a new paradigm as "machine unlearning" as evident in NeurIPS 2023 where Google launched a "Machine Unlearning" challenge. The talk will briefly go through the previous research already made in this domain, the current challenges and a usable framework for a "machine unlearning" pipeline to run.

Prior Knowledge Expected

Previous knowledge expected

I am working at Nunam, an energy analytics startup based in Bangalore, India, where my primary area of work is building health and lifecycle forecasting of Li-ion batteries in EV and energy storage. I have over 4 years of professional experience in building ML systems from the ground up after finishing my master's from IIITM, Kerala. I have spoken at both physical and virtual conferences where my primary area of focus has been on Computer Vision, MLOps, model interpretability and model compression and quantization.

Previous Talks

  1. "Managing data quality issues in ML production, especially for time-series" - Link Slides at Google Developer Group Community Day, 2022

  2. "Things I learned while running neural networks on microcontroller" - Link Slides at PyData Global 2022

  3. "Bessel's Correction: Effects of (n-1) as the denominator in Standard deviation" - Link Slides at PyData Global 2022

  4. "Interpretable ML in production" - [Slides](