PyData Global 2023

FawltyDeps: Finding undeclared and unused dependencies in your notebooks and projects
12-07, 12:00–12:30 (UTC), General Track

Reproducibility is a cornerstone of science. However, most data science projects and notebooks struggle at the most basic level of declaring dependencies correctly. A recent study showed that 42% of the notebooks executed failed due to missing dependencies.

FawltyDeps is a dependency checker that finds imports you forgot to declare (undeclared dependencies), and packages you declared, but that are not imported in your code (unused dependencies).

This talk will guide you through integrating FawltyDeps in your manual or automated workflows and how this can improve the reproducibility of your notebooks and projects.


When your Python notebooks/projects depend on 3rd-party libraries, it is not enough to just pip install them. Rather you should declare them in a way that allows other developers and users of your project to also get these dependencies and thus re-run your code. Typically you do this by declaring your dependencies in a pyproject.toml or a requirements.txt.

But how can you make sure that you have declared everything that you actually need? Maybe your system or project environment already contained a dependency that you forgot to declare? Or maybe you have declared too much, and now the project ends up installing libraries that you are not actually using?

What if there was a tool to check that what you declare and what you use in your project actually match?

FawltyDeps:
- reads your Python files and Jupyter notebooks to find imports that come from external sources.
- extracts dependencies that are declared in project requirements (pyproject.toml, requirements.txt, setup.py, setup.cfg)
- compares your imports to your declared dependencies and reports mismatches.

Fixing these issues will make your projects more reproducible, but also leaner and more lightweight. In short, it will help you combat the "works on my machine" syndrome!

FawltyDeps may be used for Python 3.7+ and is available via PyPI.

For more information:
- Announcement: https://www.tweag.io/blog/2023-03-14-announcing-fawltydeps/
- Followup: https://www.tweag.io/blog/2023-09-21-fawltydeps-mapping-strategy/
- PyPI package: https://pypi.org/project/fawltydeps/
- Documentation: https://github.com/tweag/FawltyDeps/
- Real Python podcast coverage: https://www.youtube.com/watch?v=E06DuwV1yxI&t=1307s


Prior Knowledge Expected

No previous knowledge expected

Johan is a Developer Productivity Engineer at Tweag. Johan has almost twenty years of industry experience, mostly working with Python, Linux and open source software. He has a passion for designing and implementing elegant and useful solutions.