Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Pytest for Data Scientists

Table of Contents

Pytest for Data Scientists

Table of Contents

Motivation

As a data scientist, one way to test your Python code is by using an interactive notebook to verify the accuracy of the outputs.

However, this approach does not guarantee that your code works as intended in all cases.

A better approach is to identify the expected behavior of the code in various scenarios, and then verify if the code executes accordingly.

For example, testing a function used to extract the sentiment of a text might include checking whether:

  • The function returns a value that is greater than 0 if the test is positive.
  • The function returns a value that is less than 0 if the text is negative.
#sentiment.py

def test_extract_sentiment_positive():

    text = "I think today will be a great day"

    sentiment = extract_sentiment(text)

    assert sentiment > 0

def test_extract_sentiment_negative():

    text = "I do not think this will turn out well"

    sentiment = extract_sentiment(text)

    assert sentiment < 0

Besides ensuring that your code works as intended, incorporating testing in a data science project also provides the following benefits:

  • Identifies edge cases.
  • Enables safe replacement of existing code with enhanced versions, without risking disruption of the entire process.
  • Makes it easier for your teammates to understand the behaviors of your functions.

While Python offers various testing tools, Pytest is the most user-friendly option.

Key Takeaways

Here’s what you’ll learn:

  • Write comprehensive test suites with minimal code using pytest’s intuitive framework
  • Use parametrization to test multiple scenarios with 50% less test code
  • Implement fixtures for consistent test data across your entire test suite
  • Mock external dependencies to eliminate network calls and database dependencies
  • Apply advanced testing strategies for NumPy arrays and pandas DataFrames

📚 For comprehensive production-ready testing practices with pytest, check out Production-Ready Data Science.

Get Started with Pytest

Pytest is the framework that makes it easy to write small tests in Python. I like pytest because it helps me to write tests with minimal code. If you are not familiar with testing, pytest is a great tool to get started.

To install pytest, run

pip install -U pytest

To test the extract_sentiment function, create a function that starts with test_ followed by the name of the tested function.

#sentiment.py
def extract_sentiment(text: str):
        '''Extract sentiment using textblob.
        Polarity is within range [-1, 1]'''

        text = TextBlob(text)

        return text.sentiment.polarity

def test_extract_sentiment():

    text = "I think today will be a great day"

    sentiment = extract_sentiment(text)

    assert sentiment > 0

That’s it! Now we are ready to run the test.

To test the sentiment.py file, run:

pytest sentiment.py

Pytest will run all functions that start with test in the current working directory. The output of the test above will look like this:

========================================= test session starts ==========================================
process.py .                                                                                     [100%]
========================================= 1 passed in 0.68s ===========================================

If the test fails, pytest will produce the following outputs:

#sentiment.py

def test_extract_sentiment():

    text = "I think today will be a great day"

    sentiment = extract_sentiment(text)

    assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py F                                                                                     [100%]
=============================================== FAILURES ===============================================
________________________________________ test_extract_sentiment ________________________________________
def test_extract_sentiment():

        text = "I think today will be a great day"

        sentiment = extract_sentiment(text)

>       assert sentiment < 0
E       assert 0.8 < 0
process.py:17: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment - assert 0.8 < 0
========================================== 1 failed in 0.84s ===========================================

The test failed because the sentiment of the function is 0.8, which is not less than 0. Knowing why the function doesn’t work gives us directions on how to fix it.

Multiple Tests for the Same Function

With pytest, we can also create multiple tests for the same function.

#sentiment.py

def test_extract_sentiment_positive():

    text = "I think today will be a great day"

    sentiment = extract_sentiment(text)

    assert sentiment > 0

def test_extract_sentiment_negative():

    text = "I do not think this will turn out well"

    sentiment = extract_sentiment(text)

    assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py .F                                                                                    [100%]
=============================================== FAILURES ===============================================
___________________________________ test_extract_sentiment_negative ____________________________________
def test_extract_sentiment_negative():

        text = "I do not think this will turn out well"

        sentiment = extract_sentiment(text)

>       assert sentiment < 0
E       assert 0.0 < 0
process.py:25: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment_negative - assert 0.0 < 0
===================================== 1 failed, 1 passed in 0.80s ======================================

Parametrization: Combining Tests

Since the two test functions mentioned earlier test the same function, we can combine them into one test function with parameterization.

Parametrize with a List of Samples

pytest.mark.parametrize() allows us to execute a test with different examples by providing a list of examples in the argument.

# sentiment.py

import pytest
from textblob import TextBlob

def extract_sentiment(text: str):
        '''Extract sentiment using textblob.
        Polarity is within range [-1, 1]'''

        text = TextBlob(text)

        return text.sentiment.polarity

testdata = ["I think today will be a great day", "I do not think this will turn out well"]

@pytest.mark.parametrize("sample", testdata)
def test_extract_sentiment(sample):

    sentiment = extract_sentiment(sample)

    assert sentiment > 0
========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 2 items
sentiment.py .F                                                    [100%]
================================ FAILURES ================================
_____ test_extract_sentiment[I do not think this will turn out well] _____
sample = "I do not think this will turn out well"
@pytest.mark.parametrize("sample", testdata)
    def test_extract_sentiment(sample):

        sentiment = extract_sentiment(sample)

>       assert sentiment > 0
E       assert 0.0 > 0
sentiment.py:19: AssertionError
======================== short test summary info =========================
FAILED sentiment.py::test_extract_sentiment[I do not think this will turn out well]
====================== 1 failed, 1 passed in 0.80s ===================

Parametrize with a List of Examples and Expected Outputs

What if we expect different examples to have different outputs?

For example, we might want to check if the function text_contain_word:

  • Returns True if word="duck" and text="There is a duck in this text"
  • Returns False if word="duck" and text="There is nothing here"
def text_contain_word(word: str, text: str):
    '''Find whether the text contains a particular word'''

    return word in text

To create a test for multiple examples with different expected outputs, we can use parametrize("sample, expected_out", testdata) with testdata=[(<sample1>, <output1>), (<sample2>, <output2>).

# process.py
import pytest

def text_contain_word(word: str, text: str):
    '''Find whether the text contains a particular word'''

    return word in text

testdata = [
    ("There is a duck in this text", True),
    ("There is nothing here", False)
    ]

@pytest.mark.parametrize("sample, expected_output", testdata)
def test_text_contain_word(sample, expected_output):

    word = "duck"

    assert text_contain_word(word, sample) == expected_output
$ pytest process.py
========================================= test session starts ==========================================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
plugins: hydra-core-1.0.0, Faker-4.1.1
collected 2 items
process.py ..                                                                                    [100%]
========================================== 2 passed in 0.04s ===========================================

Awesome! Both tests passed!

Test One Function at a Time

To test a specific function, run pytest file.py::function_name

testdata = ["I think today will be a great day", "I do not think this will turn out well"]

@pytest.mark.parametrize("sample", testdata)
def test_extract_sentiment(sample):

    sentiment = extract_sentiment(sample)

    assert sentiment > 0

testdata = [
    ("There is a duck in this text", True),
    ("There is nothing here", False)
    ]

@pytest.mark.parametrize("sample, expected_output", testdata)
def test_text_contain_word(sample, expected_output):

    word = "duck"

    assert text_contain_word(word, sample) == expected_output

For example, to run only test_text_contain_word, type:

pytest process.py::test_text_contain_word

Fixtures: Use the Same Data to Test Different Functions

We can also use the same data to test different functions with pytest fixture.

In the code below, we use pytest fixture to convert the sentence “Today I found a duck and I am happy” into a reusable fixture and use it in multiple tests.

@pytest.fixture
def example_data():
    return "Today I found a duck and I am happy"

def test_extract_sentiment(example_data):

    sentiment = extract_sentiment(example_data)

    assert sentiment > 0

def test_text_contain_word(example_data):

    word = "duck"

    assert text_contain_word(word, example_data)

Advanced Fixtures: Optimize Your Test Setup

When working with data science projects, you often need to load expensive datasets or set up consistent environments across multiple tests. Basic fixtures work well for small examples, but advanced fixtures can optimize your test performance and ensure reproducibility.

Session-Scoped Fixtures

Basic pytest fixtures reload data for every test function, which becomes inefficient with large datasets. Session-scoped fixtures solve this by loading the data once and reusing it across all tests.

import numpy as np
import pandas as pd
import pytest

# This fixture runs once per test session
@pytest.fixture(scope="session")
def large_dataset():
    # Simulate loading an expensive dataset
    print("Loading large dataset...")
    return pd.DataFrame({
        "feature1": np.random.randn(10000),
        "feature2": np.random.randn(10000),
        "target": np.random.randint(0, 2, 10000)
    })

def test_data_shape(large_dataset):
    assert large_dataset.shape == (10000, 3)

def test_feature_types(large_dataset):
    assert large_dataset["target"].dtype == int
    assert large_dataset["feature1"].dtype == float

Output:

Loading large dataset...
test_session_fixture.py::test_data_shape PASSED
test_session_fixture.py::test_feature_types PASSED

The dataset is loaded only once, even if you have multiple tests using it.

Autouse Fixtures

Regular fixtures require explicit inclusion in each test function, which becomes repetitive for universal setup like random seeds. Autouse fixtures solve this by running automatically before every test, ensuring consistent setup across your entire test suite.

import random

import numpy as np
import pytest

@pytest.fixture(autouse=True)
def setup_random_seeds():
    print("Setting up random seeds...")
    np.random.seed(42)
    random.seed(42)

def test_model_prediction():
    # This test will have reproducible random results
    X = np.random.randn(100, 5)
    # Your model training and prediction code here
    assert len(X) == 100

def test_data_sampling():
    # This test also gets reproducible randomness
    sample = np.random.choice([1, 2, 3, 4, 5], size=10)
    assert len(sample) == 10

Output:

Setting up random seeds...
test_autouse_fixture.py::test_model_prediction PASSED
Setting up random seeds...
test_autouse_fixture.py::test_data_sampling PASSED

You can see the fixture runs twice automatically – once before each test – even though neither test function explicitly requests the fixture.

Test with Temporary Files Safely

Testing file operations with real files can corrupt your actual data or leave behind test artifacts. Safe file testing requires temporary files that don’t interfere with your actual data and are automatically cleaned up after tests.

Pytest provides the tmp_path fixture that creates a temporary directory for each test. This is perfect for testing data processing pipelines, model serialization, or any file I/O operations.

def save_model_predictions(predictions, filepath):
    """Save model predictions to a CSV file"""
    import pandas as pd

    pd.DataFrame({"predictions": predictions}).to_csv(filepath, index=False)


def load_model_predictions(filepath):
    """Load model predictions from a CSV file"""
    import pandas as pd

    return pd.read_csv(filepath)["predictions"].tolist()


def test_save_and_load_predictions(tmp_path):
    # tmp_path is automatically created and cleaned up
    predictions = [0.1, 0.9, 0.3, 0.7]

    # Create a temporary file path
    file_path = tmp_path / "predictions.csv"

    # Test saving
    save_model_predictions(predictions, file_path)
    assert file_path.exists()

    # Test loading
    loaded_predictions = load_model_predictions(file_path)
    assert loaded_predictions == predictions

You can also test entire data processing pipelines:

def test_data_processing_pipeline(tmp_path):
    # Create temporary input file
    input_file = tmp_path / "input.csv"
    input_data = pd.DataFrame({"value": [1, 2, 3, 4, 5]})
    input_data.to_csv(input_file, index=False)

    # Create temporary output file path
    output_file = tmp_path / "processed.csv"

    # Test your processing function
    process_data(input_file, output_file)

    # Verify the output
    result = pd.read_csv(output_file)
    assert len(result) == 5
    # Add more specific assertions about your processing

Test NumPy Arrays and DataFrames Properly

In data science, you frequently work with floating-point numbers, NumPy arrays, and pandas DataFrames. Regular equality assertions often fail due to floating-point precision issues. Python provides specialized testing utilities for numerical comparisons.

Testing NumPy Arrays

Use NumPy’s testing utilities for comparing arrays with appropriate tolerance:

import numpy as np
from numpy.testing import assert_array_almost_equal, assert_array_equal


def normalize_features(data):
    """Normalize features to 0-1 range"""
    return (data - data.min()) / (data.max() - data.min())


def test_normalization():
    data = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
    normalized = normalize_features(data)

    expected = np.array([0.0, 0.25, 0.5, 0.75, 1.0])

    # Better than: assert normalized == expected (this would fail!)
    assert_array_almost_equal(normalized, expected, decimal=2)


def test_model_predictions():
    # Simulate model predictions with floating point results
    predictions = np.array([0.123456, 0.789012, 0.345678])
    expected = np.array([0.12, 0.79, 0.35])

    # Compare with 2 decimal places
    assert_array_almost_equal(predictions, expected, decimal=2)

Testing Pandas DataFrames

Use pandas testing utilities for DataFrame comparisons:

import pandas as pd


def clean_dataframe(df):
    """Remove duplicates and fill missing values"""
    return df.drop_duplicates().fillna(0)


def test_dataframe_cleaning():
    # Create test data with duplicates and NaN
    dirty_data = pd.DataFrame({"A": [1, 2, 2, np.nan], "B": [4, 5, 5, 6]})

    cleaned = clean_dataframe(dirty_data)

    expected = pd.DataFrame({"A": [1.0, 2.0, 0.0], "B": [4, 5, 6]})

    # Use pandas testing utility
    pd.testing.assert_frame_equal(cleaned.reset_index(drop=True), expected)

Mock External Dependencies

Data science projects often depend on external services like APIs, databases, or cloud storage. Testing these dependencies can be slow, expensive, or unreliable. Mocking allows you to replace real external calls with fake responses.

Mocking API Calls

The @patch decorator replaces a real function with a mock during the test. When your code tries to call the original function, it gets the mock instead.

Let’s start with a simple example:

from unittest.mock import Mock, patch

import requests


def fetch_stock_data(symbol):
    """Fetch stock price data from an API"""
    response = requests.get(f"https://api.example.com/stock/{symbol}")
    return response.json()["price"]


@patch("requests.get")
def test_fetch_stock_data_simple(mock_get):
    # Create a fake response object
    mock_response = Mock()
    mock_response.json.return_value = {"price": 150.0}

    # Make the mock return our fake response
    mock_get.return_value = mock_response

    # Use the mock instead of real requests.get
    price = fetch_stock_data("AAPL")

    # Verify we got the fake data
    assert price == 150.0

    # Verify the mock was called with the right URL
    mock_get.assert_called_once_with("https://api.example.com/stock/AAPL")

Breaking down the mock syntax:

  • @patch("requests.get") – Decorates the test function to replace requests.get with a mock object for this test only
  • mock_get – The mock object that replaces requests.get, automatically passed as a parameter to your test function
  • Mock() – Creates a fake object that can simulate any behavior you need
  • mock_response.json.return_value = {"price": 150.0} – Tells the fake response: “when someone calls .json() on you, return this dictionary”
  • mock_get.return_value = mock_response – Tells the fake requests.get: “when someone calls you, return this fake response object”
  • mock_get.assert_called_once_with(...) – Verifies that requests.get was called exactly once with the expected URL

Output:

test_mock_api.py::test_fetch_stock_data_simple PASSED

The test passes successfully without making any real network requests. That is pretty cool!

Mocking Database Queries

Database mocking applies the same @patch principles to pandas database operations. Instead of mocking requests.get, we mock pandas.read_sql to simulate database query results without needing an actual database connection.

from unittest.mock import patch

import pandas as pd

connection = None  # Simulated database connection


def get_sales_data(start_date, end_date):
    """Fetch sales data from database"""
    query = f"SELECT * FROM sales WHERE date BETWEEN '{start_date}' AND '{end_date}'"
    return pd.read_sql(query, connection)


def analyze_sales_trends(start_date, end_date):
    """Analyze sales trends over a period"""
    data = get_sales_data(start_date, end_date)
    return data.groupby("product")["amount"].sum().to_dict()


@patch("pandas.read_sql")
def test_sales_analysis(mock_read_sql):
    # Mock the database query result
    mock_data = pd.DataFrame(
        {
            "product": ["A", "B", "A", "B"],
            "amount": [100, 150, 200, 250],
            "date": ["2023-01-01", "2023-01-02", "2023-01-03", "2023-01-04"],
        }
    )
    mock_read_sql.return_value = mock_data

    result = analyze_sales_trends("2023-01-01", "2023-01-04")

    expected = {"A": 300, "B": 400}
    assert result == expected

Output:

test_database_mock.py::test_sales_analysis PASSED

The mock DataFrame lets you test complex pandas operations without database setup.

Organize Tests with Custom Markers

As your data science project grows, you’ll have different types of tests: fast unit tests, slow integration tests, tests that require special hardware (like GPUs), and tests for different stages of your ML pipeline. Custom markers help you organize and run specific test categories.

First, configure your markers in a pytest.ini file:

[tool:pytest]
markers =
    slow: marks tests as slow (deselect with "-m \"not slow\"")
    fast: marks tests as fast unit tests
    gpu: marks tests that require GPU acceleration
    integration: marks tests as integration tests
    model_training: marks tests that train ML models
    data_processing: marks tests for data processing functions

Then use these markers in your test files:

import pytest

@pytest.mark.fast
def test_data_validation():
    """Quick validation test"""
    data = [1, 2, 3, 4, 5]
    assert all(x > 0 for x in data)

@pytest.mark.slow
@pytest.mark.model_training
def test_train_complex_model():
    """This test takes several minutes"""
    # Simulate training a complex model
    import time
    time.sleep(1)  # Simulate long training
    assert True

@pytest.mark.gpu
def test_gpu_acceleration():
    """Test that requires CUDA/GPU"""
    # Test GPU-accelerated computations
    pytest.importorskip("cupy")  # Skip if GPU library not available
    import cupy as cp
    data = cp.array([1, 2, 3, 4, 5])
    assert len(data) == 5

@pytest.mark.integration
@pytest.mark.data_processing
def test_full_data_pipeline():
    """Test the complete data processing pipeline"""
    # Test end-to-end data processing
    pass

Now you can run specific test categories:

# Run only fast tests
pytest -m fast

# Run everything except slow tests
pytest -m "not slow"

# Run only GPU tests
pytest -m gpu

# Run model training and data processing tests
pytest -m "model_training or data_processing"

# Run integration tests that are not slow
pytest -m "integration and not slow"

Configure Pytest for Your Project

Large data science projects benefit from centralized test configuration and shared fixtures. Two files help organize this: pytest.ini for configuration and conftest.py for shared test utilities.

Project Configuration with pytest.ini

Create a pytest.ini file in your project root to configure pytest behavior:

[tool:pytest]
# Configure test discovery
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*

# Configure markers
markers =
    slow: marks tests as slow running
    fast: marks tests as fast unit tests
    gpu: marks tests requiring GPU
    integration: marks integration tests
    unit: marks unit tests

# Configure output
addopts = -v --tb=short --strict-markers

# Configure warnings
filterwarnings =
    ignore::UserWarning
    ignore::DeprecationWarning:sklearn.*

Key configuration sections:

  • Test Discovery: Where pytest finds tests (testpaths = tests) and naming patterns
  • Custom Markers: Categories for your tests (slow, fast, gpu, integration, unit)
  • Output Options: Verbose output (-v) and concise error traces (--tb=short)
  • Warning Filters: Hide library warnings that clutter output

Shared Fixtures with conftest.py

Create a conftest.py file to define fixtures available to all your tests:

# conftest.py
import numpy as np
import pandas as pd
import pytest
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


@pytest.fixture(scope="session")
def sample_dataset():
    """Create a sample dataset for testing"""
    np.random.seed(42)
    X = np.random.randn(1000, 5)
    y = np.random.randint(0, 2, 1000)

    return pd.DataFrame(X, columns=[f"feature_{i}" for i in range(5)]).assign(target=y)


@pytest.fixture(scope="session")
def trained_model(sample_dataset):
    """Provide a pre-trained model for testing"""
    X = sample_dataset.drop("target", axis=1)
    y = sample_dataset["target"]

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = LogisticRegression(random_state=42)
    model.fit(X_train, y_train)

    return {
        "model": model,
        "X_train": X_train,
        "X_test": X_test,
        "y_train": y_train,
        "y_test": y_test,
    }

Now any test file can use these fixtures without importing them:

# test_models.py
def test_model_accuracy(trained_model):
    model_info = trained_model
    model = model_info["model"]
    X_test = model_info["X_test"]
    y_test = model_info["y_test"]

    accuracy = model.score(X_test, y_test)
    assert accuracy > 0.5


def test_dataset_shape(sample_dataset):
    assert sample_dataset.shape == (1000, 6)  # 5 features + 1 target
    assert "target" in sample_dataset.columns

Structure your Projects

Last but not least, when our code grows bigger, we should organize the code by storing functions and their tests in two different folders. Conventionally, source code is kept in the “src” folder, while tests are stored in the “tests” folder.

To automate test executions, name your test functions as either “test_.py” or “_test.py”. Pytest will then identify and run all files ending or beginning with “test”.

This is how these two files will look like:

from textblob import TextBlob

def extract_sentiment(text: str):
        '''Extract sentiment using textblob.
        Polarity is within range [-1, 1]'''

        text = TextBlob(text)

        return text.sentiment.polarity
import pytest
from src.process import extract_sentiment

def test_extract_sentiment():

    text = "Today I found a duck and I am happy"

    sentiment = extract_sentiment(text)

    assert sentiment > 0

To run all tests, type pytest tests in the root directory:

========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 1 item
tests/test_process.py .                                            [100%]
=========================== 1 passed in 0.69s ============================

Conclusion

Congratulations! You have just learned about pytest. I hope this article gives you a good overview of why testing is important and how to incorporate testing in your data science projects with pytest. With testing, you are not only able to know whether your function works as expected but also have the confidence to transition to new tools or code structures.

Other articles you might find useful:

2 thoughts on “Pytest for Data Scientists”

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran