Table of Contents
- Motivation
- Get Started with Pytest
- Multiple Tests for the Same Function
- Parametrization: Combining Tests
- Parametrize with a List of Samples
- Parametrize with a List of Examples and Expected Outputs
- Test One Function at a Time
- Fixtures: Use the Same Data to Test Different Functions
- Advanced Fixtures: Optimize Your Test Setup
- Test with Temporary Files Safely
- Test NumPy Arrays and DataFrames Properly
- Mock External Dependencies
- Organize Tests with Custom Markers
- Configure Pytest for Your Project
- Structure your Projects
- Conclusion
Motivation
As a data scientist, one way to test your Python code is by using an interactive notebook to verify the accuracy of the outputs.
However, this approach does not guarantee that your code works as intended in all cases.
A better approach is to identify the expected behavior of the code in various scenarios, and then verify if the code executes accordingly.
For example, testing a function used to extract the sentiment of a text might include checking whether:
- The function returns a value that is greater than 0 if the test is positive.
- The function returns a value that is less than 0 if the text is negative.
#sentiment.py
def test_extract_sentiment_positive():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
assert sentiment < 0
Besides ensuring that your code works as intended, incorporating testing in a data science project also provides the following benefits:
- Identifies edge cases.
- Enables safe replacement of existing code with enhanced versions, without risking disruption of the entire process.
- Makes it easier for your teammates to understand the behaviors of your functions.
While Python offers various testing tools, Pytest is the most user-friendly option.
Key Takeaways
Here’s what you’ll learn:
- Write comprehensive test suites with minimal code using pytest’s intuitive framework
- Use parametrization to test multiple scenarios with 50% less test code
- Implement fixtures for consistent test data across your entire test suite
- Mock external dependencies to eliminate network calls and database dependencies
- Apply advanced testing strategies for NumPy arrays and pandas DataFrames
📚 For comprehensive production-ready testing practices with pytest, check out Production-Ready Data Science.
Get Started with Pytest
Pytest is the framework that makes it easy to write small tests in Python. I like pytest because it helps me to write tests with minimal code. If you are not familiar with testing, pytest is a great tool to get started.
To install pytest, run
pip install -U pytest
To test the extract_sentiment
function, create a function that starts with test_
followed by the name of the tested function.
#sentiment.py
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
That’s it! Now we are ready to run the test.
To test the sentiment.py
file, run:
pytest sentiment.py
Pytest will run all functions that start with test
in the current working directory. The output of the test above will look like this:
========================================= test session starts ==========================================
process.py . [100%]
========================================= 1 passed in 0.68s ===========================================
If the test fails, pytest will produce the following outputs:
#sentiment.py
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py F [100%]
=============================================== FAILURES ===============================================
________________________________________ test_extract_sentiment ________________________________________
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
> assert sentiment < 0
E assert 0.8 < 0
process.py:17: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment - assert 0.8 < 0
========================================== 1 failed in 0.84s ===========================================
The test failed because the sentiment of the function is 0.8, which is not less than 0. Knowing why the function doesn’t work gives us directions on how to fix it.
Multiple Tests for the Same Function
With pytest, we can also create multiple tests for the same function.
#sentiment.py
def test_extract_sentiment_positive():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py .F [100%]
=============================================== FAILURES ===============================================
___________________________________ test_extract_sentiment_negative ____________________________________
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
> assert sentiment < 0
E assert 0.0 < 0
process.py:25: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment_negative - assert 0.0 < 0
===================================== 1 failed, 1 passed in 0.80s ======================================
Parametrization: Combining Tests
Since the two test functions mentioned earlier test the same function, we can combine them into one test function with parameterization.
Parametrize with a List of Samples
pytest.mark.parametrize()
allows us to execute a test with different examples by providing a list of examples in the argument.
# sentiment.py
import pytest
from textblob import TextBlob
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
testdata = ["I think today will be a great day", "I do not think this will turn out well"]
@pytest.mark.parametrize("sample", testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
assert sentiment > 0
========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 2 items
sentiment.py .F [100%]
================================ FAILURES ================================
_____ test_extract_sentiment[I do not think this will turn out well] _____
sample = "I do not think this will turn out well"
@pytest.mark.parametrize("sample", testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
> assert sentiment > 0
E assert 0.0 > 0
sentiment.py:19: AssertionError
======================== short test summary info =========================
FAILED sentiment.py::test_extract_sentiment[I do not think this will turn out well]
====================== 1 failed, 1 passed in 0.80s ===================
Parametrize with a List of Examples and Expected Outputs
What if we expect different examples to have different outputs?
For example, we might want to check if the function text_contain_word
:
- Returns
True
ifword="duck"
andtext="There is a duck in this text"
- Returns
False
ifword="duck"
andtext="There is nothing here"
def text_contain_word(word: str, text: str):
'''Find whether the text contains a particular word'''
return word in text
To create a test for multiple examples with different expected outputs, we can use parametrize("sample, expected_out", testdata)
with testdata=[(<sample1>, <output1>), (<sample2>, <output2>)
.
# process.py
import pytest
def text_contain_word(word: str, text: str):
'''Find whether the text contains a particular word'''
return word in text
testdata = [
("There is a duck in this text", True),
("There is nothing here", False)
]
@pytest.mark.parametrize("sample, expected_output", testdata)
def test_text_contain_word(sample, expected_output):
word = "duck"
assert text_contain_word(word, sample) == expected_output
$ pytest process.py
========================================= test session starts ==========================================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
plugins: hydra-core-1.0.0, Faker-4.1.1
collected 2 items
process.py .. [100%]
========================================== 2 passed in 0.04s ===========================================
Awesome! Both tests passed!
Test One Function at a Time
To test a specific function, run pytest file.py::function_name
testdata = ["I think today will be a great day", "I do not think this will turn out well"]
@pytest.mark.parametrize("sample", testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
assert sentiment > 0
testdata = [
("There is a duck in this text", True),
("There is nothing here", False)
]
@pytest.mark.parametrize("sample, expected_output", testdata)
def test_text_contain_word(sample, expected_output):
word = "duck"
assert text_contain_word(word, sample) == expected_output
For example, to run only test_text_contain_word
, type:
pytest process.py::test_text_contain_word
Fixtures: Use the Same Data to Test Different Functions
We can also use the same data to test different functions with pytest fixture.
In the code below, we use pytest fixture to convert the sentence “Today I found a duck and I am happy” into a reusable fixture and use it in multiple tests.
@pytest.fixture
def example_data():
return "Today I found a duck and I am happy"
def test_extract_sentiment(example_data):
sentiment = extract_sentiment(example_data)
assert sentiment > 0
def test_text_contain_word(example_data):
word = "duck"
assert text_contain_word(word, example_data)
Advanced Fixtures: Optimize Your Test Setup
When working with data science projects, you often need to load expensive datasets or set up consistent environments across multiple tests. Basic fixtures work well for small examples, but advanced fixtures can optimize your test performance and ensure reproducibility.
Session-Scoped Fixtures
Basic pytest fixtures reload data for every test function, which becomes inefficient with large datasets. Session-scoped fixtures solve this by loading the data once and reusing it across all tests.
import numpy as np
import pandas as pd
import pytest
# This fixture runs once per test session
@pytest.fixture(scope="session")
def large_dataset():
# Simulate loading an expensive dataset
print("Loading large dataset...")
return pd.DataFrame({
"feature1": np.random.randn(10000),
"feature2": np.random.randn(10000),
"target": np.random.randint(0, 2, 10000)
})
def test_data_shape(large_dataset):
assert large_dataset.shape == (10000, 3)
def test_feature_types(large_dataset):
assert large_dataset["target"].dtype == int
assert large_dataset["feature1"].dtype == float
Output:
Loading large dataset...
test_session_fixture.py::test_data_shape PASSED
test_session_fixture.py::test_feature_types PASSED
The dataset is loaded only once, even if you have multiple tests using it.
Autouse Fixtures
Regular fixtures require explicit inclusion in each test function, which becomes repetitive for universal setup like random seeds. Autouse fixtures solve this by running automatically before every test, ensuring consistent setup across your entire test suite.
import random
import numpy as np
import pytest
@pytest.fixture(autouse=True)
def setup_random_seeds():
print("Setting up random seeds...")
np.random.seed(42)
random.seed(42)
def test_model_prediction():
# This test will have reproducible random results
X = np.random.randn(100, 5)
# Your model training and prediction code here
assert len(X) == 100
def test_data_sampling():
# This test also gets reproducible randomness
sample = np.random.choice([1, 2, 3, 4, 5], size=10)
assert len(sample) == 10
Output:
Setting up random seeds...
test_autouse_fixture.py::test_model_prediction PASSED
Setting up random seeds...
test_autouse_fixture.py::test_data_sampling PASSED
You can see the fixture runs twice automatically – once before each test – even though neither test function explicitly requests the fixture.
Test with Temporary Files Safely
Testing file operations with real files can corrupt your actual data or leave behind test artifacts. Safe file testing requires temporary files that don’t interfere with your actual data and are automatically cleaned up after tests.
Pytest provides the tmp_path
fixture that creates a temporary directory for each test. This is perfect for testing data processing pipelines, model serialization, or any file I/O operations.
def save_model_predictions(predictions, filepath):
"""Save model predictions to a CSV file"""
import pandas as pd
pd.DataFrame({"predictions": predictions}).to_csv(filepath, index=False)
def load_model_predictions(filepath):
"""Load model predictions from a CSV file"""
import pandas as pd
return pd.read_csv(filepath)["predictions"].tolist()
def test_save_and_load_predictions(tmp_path):
# tmp_path is automatically created and cleaned up
predictions = [0.1, 0.9, 0.3, 0.7]
# Create a temporary file path
file_path = tmp_path / "predictions.csv"
# Test saving
save_model_predictions(predictions, file_path)
assert file_path.exists()
# Test loading
loaded_predictions = load_model_predictions(file_path)
assert loaded_predictions == predictions
You can also test entire data processing pipelines:
def test_data_processing_pipeline(tmp_path):
# Create temporary input file
input_file = tmp_path / "input.csv"
input_data = pd.DataFrame({"value": [1, 2, 3, 4, 5]})
input_data.to_csv(input_file, index=False)
# Create temporary output file path
output_file = tmp_path / "processed.csv"
# Test your processing function
process_data(input_file, output_file)
# Verify the output
result = pd.read_csv(output_file)
assert len(result) == 5
# Add more specific assertions about your processing
Test NumPy Arrays and DataFrames Properly
In data science, you frequently work with floating-point numbers, NumPy arrays, and pandas DataFrames. Regular equality assertions often fail due to floating-point precision issues. Python provides specialized testing utilities for numerical comparisons.
Testing NumPy Arrays
Use NumPy’s testing utilities for comparing arrays with appropriate tolerance:
import numpy as np
from numpy.testing import assert_array_almost_equal, assert_array_equal
def normalize_features(data):
"""Normalize features to 0-1 range"""
return (data - data.min()) / (data.max() - data.min())
def test_normalization():
data = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
normalized = normalize_features(data)
expected = np.array([0.0, 0.25, 0.5, 0.75, 1.0])
# Better than: assert normalized == expected (this would fail!)
assert_array_almost_equal(normalized, expected, decimal=2)
def test_model_predictions():
# Simulate model predictions with floating point results
predictions = np.array([0.123456, 0.789012, 0.345678])
expected = np.array([0.12, 0.79, 0.35])
# Compare with 2 decimal places
assert_array_almost_equal(predictions, expected, decimal=2)
Testing Pandas DataFrames
Use pandas testing utilities for DataFrame comparisons:
import pandas as pd
def clean_dataframe(df):
"""Remove duplicates and fill missing values"""
return df.drop_duplicates().fillna(0)
def test_dataframe_cleaning():
# Create test data with duplicates and NaN
dirty_data = pd.DataFrame({"A": [1, 2, 2, np.nan], "B": [4, 5, 5, 6]})
cleaned = clean_dataframe(dirty_data)
expected = pd.DataFrame({"A": [1.0, 2.0, 0.0], "B": [4, 5, 6]})
# Use pandas testing utility
pd.testing.assert_frame_equal(cleaned.reset_index(drop=True), expected)
Mock External Dependencies
Data science projects often depend on external services like APIs, databases, or cloud storage. Testing these dependencies can be slow, expensive, or unreliable. Mocking allows you to replace real external calls with fake responses.
Mocking API Calls
The @patch
decorator replaces a real function with a mock during the test. When your code tries to call the original function, it gets the mock instead.
Let’s start with a simple example:
from unittest.mock import Mock, patch
import requests
def fetch_stock_data(symbol):
"""Fetch stock price data from an API"""
response = requests.get(f"https://api.example.com/stock/{symbol}")
return response.json()["price"]
@patch("requests.get")
def test_fetch_stock_data_simple(mock_get):
# Create a fake response object
mock_response = Mock()
mock_response.json.return_value = {"price": 150.0}
# Make the mock return our fake response
mock_get.return_value = mock_response
# Use the mock instead of real requests.get
price = fetch_stock_data("AAPL")
# Verify we got the fake data
assert price == 150.0
# Verify the mock was called with the right URL
mock_get.assert_called_once_with("https://api.example.com/stock/AAPL")
Breaking down the mock syntax:
@patch("requests.get")
– Decorates the test function to replacerequests.get
with a mock object for this test onlymock_get
– The mock object that replacesrequests.get
, automatically passed as a parameter to your test functionMock()
– Creates a fake object that can simulate any behavior you needmock_response.json.return_value = {"price": 150.0}
– Tells the fake response: “when someone calls.json()
on you, return this dictionary”mock_get.return_value = mock_response
– Tells the fakerequests.get
: “when someone calls you, return this fake response object”mock_get.assert_called_once_with(...)
– Verifies thatrequests.get
was called exactly once with the expected URL
Output:
test_mock_api.py::test_fetch_stock_data_simple PASSED
The test passes successfully without making any real network requests. That is pretty cool!
Mocking Database Queries
Database mocking applies the same @patch
principles to pandas database operations. Instead of mocking requests.get
, we mock pandas.read_sql
to simulate database query results without needing an actual database connection.
from unittest.mock import patch
import pandas as pd
connection = None # Simulated database connection
def get_sales_data(start_date, end_date):
"""Fetch sales data from database"""
query = f"SELECT * FROM sales WHERE date BETWEEN '{start_date}' AND '{end_date}'"
return pd.read_sql(query, connection)
def analyze_sales_trends(start_date, end_date):
"""Analyze sales trends over a period"""
data = get_sales_data(start_date, end_date)
return data.groupby("product")["amount"].sum().to_dict()
@patch("pandas.read_sql")
def test_sales_analysis(mock_read_sql):
# Mock the database query result
mock_data = pd.DataFrame(
{
"product": ["A", "B", "A", "B"],
"amount": [100, 150, 200, 250],
"date": ["2023-01-01", "2023-01-02", "2023-01-03", "2023-01-04"],
}
)
mock_read_sql.return_value = mock_data
result = analyze_sales_trends("2023-01-01", "2023-01-04")
expected = {"A": 300, "B": 400}
assert result == expected
Output:
test_database_mock.py::test_sales_analysis PASSED
The mock DataFrame lets you test complex pandas operations without database setup.
Organize Tests with Custom Markers
As your data science project grows, you’ll have different types of tests: fast unit tests, slow integration tests, tests that require special hardware (like GPUs), and tests for different stages of your ML pipeline. Custom markers help you organize and run specific test categories.
First, configure your markers in a pytest.ini
file:
[tool:pytest]
markers =
slow: marks tests as slow (deselect with "-m \"not slow\"")
fast: marks tests as fast unit tests
gpu: marks tests that require GPU acceleration
integration: marks tests as integration tests
model_training: marks tests that train ML models
data_processing: marks tests for data processing functions
Then use these markers in your test files:
import pytest
@pytest.mark.fast
def test_data_validation():
"""Quick validation test"""
data = [1, 2, 3, 4, 5]
assert all(x > 0 for x in data)
@pytest.mark.slow
@pytest.mark.model_training
def test_train_complex_model():
"""This test takes several minutes"""
# Simulate training a complex model
import time
time.sleep(1) # Simulate long training
assert True
@pytest.mark.gpu
def test_gpu_acceleration():
"""Test that requires CUDA/GPU"""
# Test GPU-accelerated computations
pytest.importorskip("cupy") # Skip if GPU library not available
import cupy as cp
data = cp.array([1, 2, 3, 4, 5])
assert len(data) == 5
@pytest.mark.integration
@pytest.mark.data_processing
def test_full_data_pipeline():
"""Test the complete data processing pipeline"""
# Test end-to-end data processing
pass
Now you can run specific test categories:
# Run only fast tests
pytest -m fast
# Run everything except slow tests
pytest -m "not slow"
# Run only GPU tests
pytest -m gpu
# Run model training and data processing tests
pytest -m "model_training or data_processing"
# Run integration tests that are not slow
pytest -m "integration and not slow"
Configure Pytest for Your Project
Large data science projects benefit from centralized test configuration and shared fixtures. Two files help organize this: pytest.ini
for configuration and conftest.py
for shared test utilities.
Project Configuration with pytest.ini
Create a pytest.ini
file in your project root to configure pytest behavior:
[tool:pytest]
# Configure test discovery
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*
# Configure markers
markers =
slow: marks tests as slow running
fast: marks tests as fast unit tests
gpu: marks tests requiring GPU
integration: marks integration tests
unit: marks unit tests
# Configure output
addopts = -v --tb=short --strict-markers
# Configure warnings
filterwarnings =
ignore::UserWarning
ignore::DeprecationWarning:sklearn.*
Key configuration sections:
- Test Discovery: Where pytest finds tests (
testpaths = tests
) and naming patterns - Custom Markers: Categories for your tests (slow, fast, gpu, integration, unit)
- Output Options: Verbose output (
-v
) and concise error traces (--tb=short
) - Warning Filters: Hide library warnings that clutter output
Shared Fixtures with conftest.py
Create a conftest.py
file to define fixtures available to all your tests:
# conftest.py
import numpy as np
import pandas as pd
import pytest
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
@pytest.fixture(scope="session")
def sample_dataset():
"""Create a sample dataset for testing"""
np.random.seed(42)
X = np.random.randn(1000, 5)
y = np.random.randint(0, 2, 1000)
return pd.DataFrame(X, columns=[f"feature_{i}" for i in range(5)]).assign(target=y)
@pytest.fixture(scope="session")
def trained_model(sample_dataset):
"""Provide a pre-trained model for testing"""
X = sample_dataset.drop("target", axis=1)
y = sample_dataset["target"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
return {
"model": model,
"X_train": X_train,
"X_test": X_test,
"y_train": y_train,
"y_test": y_test,
}
Now any test file can use these fixtures without importing them:
# test_models.py
def test_model_accuracy(trained_model):
model_info = trained_model
model = model_info["model"]
X_test = model_info["X_test"]
y_test = model_info["y_test"]
accuracy = model.score(X_test, y_test)
assert accuracy > 0.5
def test_dataset_shape(sample_dataset):
assert sample_dataset.shape == (1000, 6) # 5 features + 1 target
assert "target" in sample_dataset.columns
Structure your Projects
Last but not least, when our code grows bigger, we should organize the code by storing functions and their tests in two different folders. Conventionally, source code is kept in the “src” folder, while tests are stored in the “tests” folder.
To automate test executions, name your test functions as either “test_
This is how these two files will look like:
from textblob import TextBlob
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
import pytest
from src.process import extract_sentiment
def test_extract_sentiment():
text = "Today I found a duck and I am happy"
sentiment = extract_sentiment(text)
assert sentiment > 0
To run all tests, type pytest tests
in the root directory:
========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 1 item
tests/test_process.py . [100%]
=========================== 1 passed in 0.69s ============================
Conclusion
Congratulations! You have just learned about pytest. I hope this article gives you a good overview of why testing is important and how to incorporate testing in your data science projects with pytest. With testing, you are not only able to know whether your function works as expected but also have the confidence to transition to new tools or code structures.
Other articles you might find useful:
2 thoughts on “Pytest for Data Scientists”
Hello Khuyen,
thank you for this detailed article!
I’m not a data scientist, but even if I already used pytest I learned something new today! (importorskip)
for the mocking stuff, if you want to mock requests, consider using responses (https://pypi.org/project/responses/)
if you use httpx for HTTP requests, consider using respx (https://pypi.org/project/respx/)
For other types of mocking, consider pytest-mock (https://pypi.org/project/pytest-mock/)
Hi Kevin, I am happy to hear that! Thanks for these great libraries!