Jupytext: Transform Notebooks into Version Control-Friendly Text

Khuyen Tran

Motivation

Data scientists and analysts often struggle with messy Git diffs when collaborating on Jupyter notebooks, as even small changes can result in large, hard-to-read differences due to cell outputs and metadata changes.

Example:

# Trying to understand changes in a .ipynb file diff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {"data": {"text/plain": ["<large output blob>"]}, "execution_count": 1}
   ],
   "source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')"]
  }
 ],
 "metadata": {"kernelspec": {...}, "language_info": {...}}
}

Introduction to Jupytext

Jupytext is a tool that converts Jupyter notebooks to various text formats like Markdown documents or Python scripts, making it easier to control and edit versions in IDEs. It can be installed using pip:

pip install jupytext

or conda:

conda install jupytext -c conda-forge

Paired Notebooks Feature

Jupytext solves the version control challenge by allowing you to pair your .ipynb files with text-based formats (.py or .md), maintaining both versions in sync automatically.

Let’s demonstrate how Jupytext automatically synchronizes paired files:

First, set up a paired notebook called analysis.ipynb:

In JupyterLab, pair your notebook to one or more text formats with the Jupytext commands:

This will automatically create a markdown version of the notebook called analysis.md:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

Now, let’s modify the notebook in Jupyter (analysis.ipynb):

After saving the notebook in Jupyter, analysis.md is automatically updated:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

You can also edit the markdown file directly and see changes reflected in the notebook:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

## New Section

After reloading the notebook in Jupyter, analysis.ipynb will contain:

Key points about the synchronization:

Changes in either file format are automatically reflected in the paired file when saving
The synchronization preserves the notebook structure and cell types (code/markdown)
Cell outputs are only stored in the .ipynb file
You need to reload the notebook in Jupyter to see changes made to the text file

This automatic synchronization makes it easy to:

Edit notebooks in your preferred text editor
Track changes in version control
Collaborate with team members
Maintain documentation and code in sync

Conclusion

Jupytext provides an elegant solution to version controlling Jupyter notebooks by separating the content from the outputs, making collaboration and code review much more manageable for data science teams. Its flexibility in supporting both Python and Markdown formats allows teams to choose the most appropriate format for their specific needs.

Link to Jupytext

marimo: Reactive Notebooks for Effortless Visualizations

March 7, 2025

IPyflow: Automatic Dependency Tracking in Jupyter Notebooks

February 2, 2025

Automate Jupyter Notebooks with Papermill

November 15, 2024

Jupytext: Transform Notebooks into Version Control-Friendly Text

Table of Contents

Jupytext: Transform Notebooks into Version Control-Friendly Text

Khuyen Tran

Motivation

Introduction to Jupytext

Paired Notebooks Feature

Conclusion

Related Posts

Leave a Comment Cancel Reply

Stay up-to-date with
data skills using
CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Jupytext: Transform Notebooks into Version Control-Friendly Text

Table of Contents

Jupytext: Transform Notebooks into Version Control-Friendly Text

Khuyen Tran

Motivation

Introduction to Jupytext

Paired Notebooks Feature

Conclusion

Related Posts

Leave a Comment Cancel Reply

Stay up-to-date with data skills using CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Work with Khuyen Tran

Work with Khuyen Tran

Stay up-to-date with
data skills using
CodeCut