Motivation
As a data scientist, one way to test your Python code is by using an interactive notebook to verify the accuracy of the outputs.
However, this approach does not guarantee that your code works as intended in all cases.
A better approach is to identify the expected behavior of the code in various scenarios, and then verify if the code executes accordingly.
For example, testing a function used to extract the sentiment of a text might include checking whether:
- The function returns a value that is greater than 0 if the test is positive.
- The function returns a value that is less than 0 if the text is negative.
#sentiment.py
def test_extract_sentiment_positive():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
assert sentiment < 0
Besides ensuring that your code works as intended, incorporating testing in a data science project also provides the following benefits:
- Identifies edge cases.
- Enables safe replacement of existing code with enhanced versions, without risking disruption of the entire process.
- Makes it easier for your teammates to understand the behaviors of your functions.
While Python offers various testing tools, Pytest is the most user-friendly option.
The source code for this article can be found here.
Get Started with Pytest
Pytest is the framework that makes it easy to write small tests in Python. I like pytest because it helps me to write tests with minimal code. If you are not familiar with testing, pytest is a great tool to get started.
To install pytest, run
pip install -U pytest
To test the extract_sentiment
function, create a function that starts with test_
followed by the name of the tested function.
#sentiment.py
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
That’s it! Now we are ready to run the test.
To test the sentiment.py
file, run:
pytest sentiment.py
Pytest will run all functions that start with test
in the current working directory. The output of the test above will look like this:
========================================= test session starts ==========================================
process.py . [100%]
========================================= 1 passed in 0.68s ===========================================
If the test fails, pytest will produce the following outputs:
#sentiment.py
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py F [100%]
=============================================== FAILURES ===============================================
________________________________________ test_extract_sentiment ________________________________________
def test_extract_sentiment():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
> assert sentiment < 0
E assert 0.8 < 0
process.py:17: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment - assert 0.8 < 0
========================================== 1 failed in 0.84s ===========================================
The test failed because the sentiment of the function is 0.8, which is not less than 0. Knowing why the function doesn’t work gives us directions on how to fix it.
Multiple Tests for the Same Function
With pytest, we can also create multiple tests for the same function.
#sentiment.py
def test_extract_sentiment_positive():
text = "I think today will be a great day"
sentiment = extract_sentiment(text)
assert sentiment > 0
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
assert sentiment < 0
$ pytest sentiment.py
========================================= test session starts ==========================================
process.py .F [100%]
=============================================== FAILURES ===============================================
___________________________________ test_extract_sentiment_negative ____________________________________
def test_extract_sentiment_negative():
text = "I do not think this will turn out well"
sentiment = extract_sentiment(text)
> assert sentiment < 0
E assert 0.0 < 0
process.py:25: AssertionError
======================================= short test summary info ========================================
FAILED process.py::test_extract_sentiment_negative - assert 0.0 < 0
===================================== 1 failed, 1 passed in 0.80s ======================================
Parametrization: Combining Tests
Since the two test functions mentioned earlier test the same function, we can combine them into one test function with parameterization.
Parametrize with a List of Samples
pytest.mark.parametrize()
allows us to execute a test with different examples by providing a list of examples in the argument.
# sentiment.py
from textblob import TextBlob
import pytest
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
testdata = ["I think today will be a great day","I do not think this will turn out well"]
@pytest.mark.parametrize('sample', testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
assert sentiment > 0
========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 2 items
sentiment.py .F [100%]
================================ FAILURES ================================
_____ test_extract_sentiment[I do not think this will turn out well] _____
sample = 'I do not think this will turn out well'
@pytest.mark.parametrize('sample', testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
> assert sentiment > 0
E assert 0.0 > 0
sentiment.py:19: AssertionError
======================== short test summary info =========================
FAILED sentiment.py::test_extract_sentiment[I do not think this will turn out well]
====================== 1 failed, 1 passed in 0.80s ===================
Parametrize with a List of Examples and Expected Outputs
What if we expect different examples to have different outputs?
For example, we might want to check if the function text_contain_word
:
- Returns
True
ifword="duck"
andtext="There is a duck in this text"
- Returns
False
ifword="duck"
andtext="There is nothing here"
def text_contain_word(word: str, text: str):
'''Find whether the text contains a particular word'''
return word in text
To create a test for multiple examples with different expected outputs, we can use parametrize(‘sample, expected_out’, testdata)
with testdata=[(<sample1>, <output1>), (<sample2>, <output2>)
.
# process.py
import pytest
def text_contain_word(word: str, text: str):
'''Find whether the text contains a particular word'''
return word in text
testdata = [
('There is a duck in this text',True),
('There is nothing here', False)
]
@pytest.mark.parametrize('sample, expected_output', testdata)
def test_text_contain_word(sample, expected_output):
word = 'duck'
assert text_contain_word(word, sample) == expected_output
$ pytest process.py
========================================= test session starts ==========================================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
plugins: hydra-core-1.0.0, Faker-4.1.1
collected 2 items
process.py .. [100%]
========================================== 2 passed in 0.04s ===========================================
Awesome! Both tests passed!
Test One Function at a Time
To test a specific function, run pytest file.py::function_name
testdata = ["I think today will be a great day","I do not think this will turn out well"]
@pytest.mark.parametrize('sample', testdata)
def test_extract_sentiment(sample):
sentiment = extract_sentiment(sample)
assert sentiment > 0
testdata = [
('There is a duck in this text',True),
('There is nothing here', False)
]
@pytest.mark.parametrize('sample, expected_output', testdata)
def test_text_contain_word(sample, expected_output):
word = 'duck'
assert text_contain_word(word, sample) == expected_output
For example, to run only test_text_contain_word
, type:
pytest process.py::test_text_contain_word
Fixtures: Use the Same Data to Test Different Functions
We can also use the same data to test different functions with pytest fixture.
In the code below, we use pytest fixture to convert the sentence “Today I found a duck and I am happy” into a reusable fixture and use it in multiple tests.
@pytest.fixture
def example_data():
return 'Today I found a duck and I am happy'
def test_extract_sentiment(example_data):
sentiment = extract_sentiment(example_data)
assert sentiment > 0
def test_text_contain_word(example_data):
word = 'duck'
assert text_contain_word(word, example_data) == True
Structure your Projects
Last but not least, when our code grows bigger, we should organize the code by storing functions and their tests in two different folders. Conventionally, source code is kept in the “src” folder, while tests are stored in the “tests” folder.
To automate test executions, name your test functions as either “test_<name>.py” or “<name>_test.py”. Pytest will then identify and run all files ending or beginning with “test”.
This is how these two files will look like:
from textblob import TextBlob
def extract_sentiment(text: str):
'''Extract sentiment using textblob.
Polarity is within range [-1, 1]'''
text = TextBlob(text)
return text.sentiment.polarity
from src.process import extract_sentiment
import pytest
def test_extract_sentiment():
text = 'Today I found a duck and I am happy'
sentiment = extract_sentiment(text)
assert sentiment > 0
To run all tests, type pytest tests
in the root directory:
========================== test session starts ===========================
platform linux -- Python 3.8.3, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
collected 1 item
tests/test_process.py . [100%]
=========================== 1 passed in 0.69s ============================
Conclusion
Congratulations! You have just learned about pytest. I hope this article gives you a good overview of why testing is important and how to incorporate testing in your data science projects with pytest. With testing, you are not only able to know whether your function works as expected but also have the confidence to transition to new tools or code structures.