Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Jupytext: Transform Notebooks into Version Control-Friendly Text

Table of Contents

Jupytext: Transform Notebooks into Version Control-Friendly Text

Motivation

Data scientists and analysts often struggle with messy Git diffs when collaborating on Jupyter notebooks, as even small changes can result in large, hard-to-read differences due to cell outputs and metadata changes.

Example:

# Trying to understand changes in a .ipynb file diff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {"data": {"text/plain": ["<large output blob>"]}, "execution_count": 1}
   ],
   "source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')"]
  }
 ],
 "metadata": {"kernelspec": {...}, "language_info": {...}}
}

Introduction to Jupytext

Jupytext is a tool that converts Jupyter notebooks to various text formats like Markdown documents or Python scripts, making it easier to control and edit versions in IDEs. It can be installed using pip:

pip install jupytext

or conda:

conda install jupytext -c conda-forge

Paired Notebooks Feature

Jupytext solves the version control challenge by allowing you to pair your .ipynb files with text-based formats (.py or .md), maintaining both versions in sync automatically.

Let’s demonstrate how Jupytext automatically synchronizes paired files:

First, set up a paired notebook called analysis.ipynb:

In JupyterLab, pair your notebook to one or more text formats with the Jupytext commands:

This will automatically create a markdown version of the notebook called analysis.md:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

Now, let’s modify the notebook in Jupyter (analysis.ipynb):

After saving the notebook in Jupyter, analysis.md is automatically updated:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

You can also edit the markdown file directly and see changes reflected in the notebook:

---
jupyter:
  jupytext:
    formats: ipynb,md
    text_representation:
      extension: .md
      format_name: markdown
      format_version: '1.3'
      jupytext_version: 1.16.7
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

## My Analysis

```python
# Import libraries
import pandas as pd
import numpy as np

# Load some data
data = pd.DataFrame({'A': [1, 2, 3]})
```

```python
# Add a new analysis section
mean_value = data['A'].mean()
print(f"Mean value: {mean_value}")
```

## New Section

After reloading the notebook in Jupyter, analysis.ipynb will contain:

Key points about the synchronization:

  • Changes in either file format are automatically reflected in the paired file when saving
  • The synchronization preserves the notebook structure and cell types (code/markdown)
  • Cell outputs are only stored in the .ipynb file
  • You need to reload the notebook in Jupyter to see changes made to the text file

This automatic synchronization makes it easy to:

  • Edit notebooks in your preferred text editor
  • Track changes in version control
  • Collaborate with team members
  • Maintain documentation and code in sync

Conclusion

Jupytext provides an elegant solution to version controlling Jupyter notebooks by separating the content from the outputs, making collaboration and code review much more manageable for data science teams. Its flexibility in supporting both Python and Markdown formats allows teams to choose the most appropriate format for their specific needs.

Link to Jupytext

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran