The Challenges of Version Controlling Notebooks
Notebooks are complex objects that contain code, output, and metadata, making them challenging for traditional version control systems like Git to manage. Specific issues include:
- Output cells: Output cells can be large and change frequently, making it difficult to track changes.
- Dependencies: Notebooks often rely on external libraries and packages, which can be difficult to manage.
- Inconsistent code formatting: Inconsistent formatting can lead to unnecessary changes and make version control more complicated.
To address this issue, we’ll explore three version control tools specifically designed for notebooks:
- pipreqsnb
- nbstripout
- nbqa
These tools aim to provide a more effective way to manage notebook versions and simplify collaboration.
nbstripout: A Tool for Stripping Output Cells
nbstripout is a tool that strips output cells from notebooks, making it easier to track changes. By removing output cells, nbstripout reduces the noise in the diff and makes it easier to focus on code changes.
To install nbstripout, run:
pip install nbstripoutTo use nbstripout, simply run it on your notebook:
nbstripout my_notebook.ipynbOutput:

nbqa: A Tool for Quality Assurance and Version Control
nbqa is a tool that checks the quality of the code in your Jupyter Notebook and automatically formats it. With nbqa, you can run isort, black, flake8, and more on your Jupyter Notebooks.
To install nbQA, run:
pip install nbqaLet’s take an example notebook example_notebook.ipynb that looks like this:
import pandas as pd
import numpy as np
a = [1,2,3,4]Format the code using nbqa:
nbqa black example_notebook.ipynbOutput:
All done! ✨ 🍰 ✨
1 file left unchanged.Check the style and quality of the code using nbqa:
nbqa flake8 example_notebook.ipynbOutput:
example_notebook.ipynb:cell_1:1:1: F401 'pandas as pd' imported but unused
example_notebook.ipynb:cell_1:3:1: F401 'numpy as np' imported but unusedSort the imports in the notebook using nbqa:
nbqa isort example_notebook.ipynbOutput:
Fixing /home/khuyen/book/book/Chapter7/example_notebook.ipynbAfter running all of these commands, the notebook looks much cleaner:
import numpy as np
import pandas as pd
a = [1, 2, 3, 4]To automate the process, you can configure nbqa to run automatically every time you commit a Jupyter Notebook using pre-commit. Here’s an example pre-commit-config.yaml file:
# pre-commit-config.yaml
repos:
- repo: https://github.com/nbQA-dev/nbQA
rev: 0.10.0
hooks:
- id: nbqa-flake8
- id: nbqa-isort
- id: nbqa-blackpipreqsnb: A Tool for Managing Dependencies
pipreqsnb is a tool that generates a requirements.txt file based on the imports in your Jupyter Notebooks. This is useful for managing dependencies and ensuring that your notebooks are reproducible.
To use pipreqsnb, simply run it on your notebook directory:
pipreqsnb . Output:
pipreqs .
INFO: Successfully saved requirements file in ./requirements.txtThe resulting requirements.txt file will look something like this:
pandas==1.3.4
numpy==1.20.3
ipython==7.30.1
scikit_learn==1.0.2Conclusion
Version controlling notebooks can be complex, but tools like nbstripout, nbqa, and pipreqsnb simplify the process. These tools help you:
- Get more accurate and informative diffs
- Filter out output cells
- Format code consistently
- Manage dependencies effectively
By taking care of these tasks, these tools allow you to focus on code changes and collaborate with others more efficiently.


