...

Marimo: A Modern Notebook for Reproducible Data Science

Marimo: A Modern Notebook for Reproducible Data Science

Have you ever struggled to create notebooks that are both interactive and reproducible? As a data scientist, it is common to rely on interactive workflows for exploration, but this often comes at the cost of consistency and repeatability.

In traditional environments like Jupyter, running cells out of order, depending on hidden state, and modifying variables mid-stream can easily lead to confusion and bugs.

For example, you might define a list data in cell 1, run a summary computation in cell 2, then redefine data in cell 3:

If you run cell 3 after cell 1, and then re-run cell 2 without realizing data has changed, you’ll get Sum: 60 instead of Sum: 6.

This type of out-of-order execution can be easily overlooked and may introduce subtle errors.

Marimo is a modern Python notebook that enforces reproducibility by design, supports interactive apps, and integrates seamlessly into your data science workflows. With Marimo, you no longer have to choose between interactivity and reproducibility—you get both in a single, consistent environment.

Why Marimo?

Marimo solves three major issues common in traditional notebooks:

  1. Hidden state and out-of-order execution
    Marimo enforces top-to-bottom execution with clear cell dependencies.
  2. Version control and diffing
    Marimo notebooks are just Python scripts—easy to diff, review, and manage in Git.
  3. Reusability and sharing
    You can export a Marimo notebook as a web app or module, making your analysis immediately interactive and reusable.

Getting Started

Install Marimo using pip:

pip install marimo

Create a new notebook:

marimo new my_notebook.py

Start the notebook:

marimo edit my_notebook.py

Your notebook opens in a browser, but stays a clean .py file under the hood.

Auto-Update Dependent Cells

Problem in Traditional Notebooks

When you change an input value or upstream logic, dependent cells will not automatically update, leading to inconsistent results unless you manually re-execute them.

For example, in a Jupyter Notebook ,you might have:

If you change the threshold to 30 in Cell 1 but forget to rerun Cell 2, the printed output will still reflect the old threshold:

This discrepancy can be subtle and lead to incorrect conclusions if you don’t carefully re-execute every dependent cell.

How Marimo Solves This

Another major strength of Marimo is automatic dependency tracking. When you change the output of a cell, all downstream cells that depend on its result are automatically re-executed.

This ensures your notebook is always in a consistent state, without requiring manual reruns of cells. For example:

Now if you change threshold from 50 to 30, the second cell will rerun automatically, updating the filtered output:

Marimo detects the change in threshold and ensures all dependent cells update in sync, avoiding stale results.

This live dependency resolution eliminates the need to track what to re-execute, making your workflow more robust.

Prevent Variable Redefinition

Problem in Traditional Notebooks

Variables can be silently redefined in any cell, which can cause logical errors or unintentional overwrites that are hard to trace.

For example, in a Jupyter Notebook:

If you redefine data in Cell 3 but forget that Cell 2 depends on it, re-running Cell 2 after Cell 3 will produce a different result than originally intended, without any warning.

How Marimo Solves This

Marimo prevents you from accidentally redefining a variable across different cells. This eliminates bugs caused by naming collisions or silent overwrites in long notebooks.

For example, if you try to redefine data in separate cells, Marimo will raise an error, warning you that data has already been defined.

To update a variable, you must explicitly change its original definition or rename it.

This enforces clarity, making the notebook’s logic more predictable and transparent.

Enable Clean Version Control

Problem in Traditional Notebooks

.ipynb files store output alongside code in a JSON format, making diffs unreadable and code reviews difficult.

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "82e65582-f723-4e3f-8098-b42044ae85ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [1, 2, 3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "f5661f3b-7b25-4a51-91f2-8f0feca3739e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sum: 60\n"
     ]
    }
   ],
   "source": [
    "summary = sum(data)\n",
    "print(\"Sum:\", summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9a54c42c-7a41-4a49-b334-6c4086784a84",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [10, 20, 30]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "233e800c-11aa-4240-a5b7-21a1543ec947",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

How Marimo Solves This

Since Marimo notebooks are plain Python files, they can be versioned just like any other source code.

For example, this notebook:

is actually a .py file under the hood:


import marimo

__generated_with = "0.13.0"
app = marimo.App()


@app.cell
def _():
    data = [1, 2, 3]
    return (data,)


@app.cell
def _(data):
    summary = sum(data)
    print("Sum:", summary)
    return


@app.cell
def _():
    data_1 = [10, 20, 30]
    return


@app.cell
def _():
    return


if __name__ == "__main__":
    app.run()

You can use Git to track changes, compare diffs, and conduct code reviews line-by-line.

Build Parameterized Interactive Workflows

Problem in Traditional Notebooks

Interactive widgets require additional setup and do not integrate seamlessly with the computation graph, making parameter-driven exploration brittle and difficult to reproduce.

How Marimo Solves This

Marimo treats interactive UI elements as native components of the notebook. They define explicit, trackable inputs that directly link to computation, ensuring that outputs are consistently and reproducibly tied to user-defined parameters.

Example:

Changing the slider updates multiplier.value, which triggers all dependent cells. This allows parameter-driven exploration while preserving a reproducible and traceable code path.

Isolate Execution with Sandbox Mode

Problem in Traditional Notebooks

Notebook environments are not isolated by default, meaning dependencies or variables from other projects or past sessions can interfere with your results. There is no guarantee that what is in memory or on disk reflects only what is in the notebook, making reproducibility fragile.

How Marimo Solves This

Marimo offers a --sandbox mode that runs your notebook in a fully isolated environment. This ensures your code doesn’t unintentionally rely on globally installed packages or cached variables from other projects. Everything it needs must be declared in the notebook itself.

To see this in action, consider a minimal Marimo notebook:

import pandas as pd
import numpy as np

# Simulate some data
np.random.seed(1)
df = pd.DataFrame({"value": np.random.randn(5)})
print(df)

If pandas and numpy happen to be installed globally, you may not notice any issues while editing or running the notebook normally.

marimo edit my_notebook.py

But when you launch the notebook in sandbox mode, Marimo verifies that all dependencies are declared.

To use --sandbox, Marimo requires the <a href="https://github.com/astral-sh/uv">uv</a> package manager to manage isolated environments.

To edit the notebook in sandbox mode, run:

marimo edit my_notebook.py --sandbox

If any are missing, Marimo will prompt you to install them:

And then automatically add them to the notebook’s script header:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "marimo",
#     "numpy==2.2.5",
#     "pandas==2.2.3",
# ]
# ///


import marimo

__generated_with = "0.13.0"
app = marimo.App(width="medium")


@app.cell
def _():
    import pandas as pd
    import numpy as np

    np.random.seed(1)
    df = pd.DataFrame({"value": np.random.randn(5)})
    print(df)
    return


if __name__ == "__main__":
    app.run() 

The next time you open the notebook in sandbox mode, Marimo will reinstall these packages in an isolated environment before launching.

This ensures your notebook runs consistently everywhere, with no surprises. It’s a lightweight but powerful step toward full reproducibility.

Export in Multiple Reusable Formats

Problem in Traditional Notebooks

Notebooks are often treated as single-use or exploratory, making it difficult to modularize or reuse code without copying and pasting between files.

How Marimo Solves This

Marimo notebooks are not only readable—they’re modular and exportable. You can convert your notebook into multiple formats depending on your use case in a data science project:

marimo export html my_notebook.py  
marimo export html-wasm my_notebook.py
marimo export ipynb my_notebook.py 
marimo export md my_notebook.py        
marimo export script my_notebook.py     # Clean Python script for production

Each export mode has a specific purpose:

  • HTML: Great for sharing static visualizations and reports with stakeholders.
  • HTML-WASM: Perfect for publishing interactive dashboards online without needing a backend server.
  • IPYNB: Enables compatibility with traditional Jupyter workflows.
  • Markdown: Ideal for documentation, blogs, or version-controlled notebooks in prose form.
  • Script: Useful for integrating analytical logic into larger Python projects or production pipelines.

This flexibility allows you to deliver your analysis in the correct format for each stage of the data science lifecycle—exploration, communication, automation, or deployment.

Example Workflow: From Local Editing to Online Sharing

Let’s put everything together and explore how to go from local editing to an online shareable dashboard hosted on GitHub Pages.

You can find the source code for this example here, and the final dashboard, hosted on a GitHub page, here.

Edit Your Notebook in a Clean Environment

    marimo edit dashboard.py --sandbox

    This ensures that all dependencies are declared, allowing the notebook to run independently of any system-level Python packages.

    Export Interactive Results as a Shareable Dashboard

      marimo export html-wasm dashboard.py -o build --sandbox

      To launch a local web server for the folder, run:

      python -m http.server --directory build

      You will see an interactive dashboard as shown below:

      Add a GitHub Actions Workflow

      Let’s create a GitHub Actions workflow to rerun and export your notebook on every push:

      Start by configuring GitHub Pages for GitHub Actions by following these steps:

      1. Go to your repository and click the Settings tab.
      2. In the left sidebar, scroll down and click on Pages.
      3. Under Build and deployment, set the Source to GitHub Actions.

      Add a GitHub Actions workflow file:

      name: Deploy Marimo Notebook
      
      on:
        push:
          branches: [main]
          paths:
            - dashboard.py
            - .github/workflows/deploy.yml
      
      jobs:
        build:
          runs-on: ubuntu-latest
          steps:
            - uses: actions/checkout@v4
      
            - name: Set up Python
              uses: actions/setup-python@v4
              with:
                python-version: '3.11'
      
            - name: Install uv
              uses: astral-sh/setup-uv@v5
      
            - name: Install dependencies
              run: uv add marimo
      
            - name: Export marimo notebook
              run: uv run marimo export html-wasm dashboard.py --output build --sandbox
      
            - name: Upload artifact
              uses: actions/upload-pages-artifact@v3
              with:
                path: build
      
        deploy:
          needs: build
          runs-on: ubuntu-latest
          environment:
            name: github-pages
            url: ${{ steps.deployment.outputs.page_url }}
      
          permissions:
            pages: write
            id-token: write
      
          steps:
            - name: 🌐 Deploy to GitHub Pages
              id: deployment
              uses: actions/deploy-pages@v4
              with:
                artifact_name: github-pages

      Add, commit, and push this workflow:

      git add .
      git commit -m 'add github workflow'
      git push origin main

      You should see something like this when visiting the website hosted on GitHub Pages.

      Now, when you edit your Marimo notebook and push the changes to the GitHub repository, the GitHub page will automatically update with the modified dashboard.

      Final Thoughts

      Marimo brings reproducibility, shareability, and structure to data science notebooks.

      If you’re tired of broken Jupyter workflows or want cleaner collaboration with Git, it’s worth exploring.

      Leave a Comment

      Your email address will not be published. Required fields are marked *

      Scroll to Top

      Work with Khuyen Tran

      Work with Khuyen Tran

      Seraphinite AcceleratorOptimized by Seraphinite Accelerator
      Turns on site high speed to be attractive for people and search engines.