Jupytext: Transform Notebooks into Version Control-Friendly Text

Jupytext: Transform Notebooks into Version Control-Friendly Text

Motivation

Data scientists and analysts often struggle with messy Git diffs when collaborating on Jupyter notebooks, as even small changes can result in large, hard-to-read differences due to cell outputs and metadata changes.

Example:

# Trying to understand changes in a .ipynb file diff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {"data": {"text/plain": ["<large output blob>"]}, "execution_count": 1}
   ],
   "source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')"]
  }
 ],
 "metadata": {"kernelspec": {...}, "language_info": {...}}
}

Introduction to Jupytext

Jupytext is a tool that converts Jupyter notebooks to various text formats like Markdown documents or Python scripts, making it easier to control and edit versions in IDEs. It can be installed using pip:

pip install jupytext

or conda:

conda install jupytext -c conda-forge

Paired Notebooks Feature

Jupytext solves the version control challenge by allowing you to pair your .ipynb files with text-based formats (.py or .md), maintaining both versions in sync automatically.

Let’s demonstrate how Jupytext automatically synchronizes paired files:

First, set up a paired notebook called analysis.ipynb:

In JupyterLab, pair your notebook to one or more text formats with the Jupytext commands:

This will automatically create a markdown version of the notebook called analysis.md:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

Now, let’s modify the notebook in Jupyter (analysis.ipynb):

After saving the notebook in Jupyter, analysis.md is automatically updated:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

You can also edit the markdown file directly and see changes reflected in the notebook:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

## New Section

After reloading the notebook in Jupyter, analysis.ipynb will contain:

Key points about the synchronization:

  • Changes in either file format are automatically reflected in the paired file when saving
  • The synchronization preserves the notebook structure and cell types (code/markdown)
  • Cell outputs are only stored in the .ipynb file
  • You need to reload the notebook in Jupyter to see changes made to the text file

This automatic synchronization makes it easy to:

  • Edit notebooks in your preferred text editor
  • Track changes in version control
  • Collaborate with team members
  • Maintain documentation and code in sync

Conclusion

Jupytext provides an elegant solution to version controlling Jupyter notebooks by separating the content from the outputs, making collaboration and code review much more manageable for data science teams. Its flexibility in supporting both Python and Markdown formats allows teams to choose the most appropriate format for their specific needs.

Link to Jupytext

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran