Docstring examples clarify function purpose and use, but may become outdated as code changes.
Python’s doctest module solves this by automatically testing docstring examples, ensuring they remain accurate as the code evolves.
import pandas as pd
def check_numerical_columns(df: pd.DataFrame, columns: list[str]) -> list[str]:
"""
Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1, 2], "B": [4, 5], "C": ["a", "b"]})
>>> check_numerical_columns(df, ["A", "B"])
['A', 'B']
>>> check_numerical_columns(df, ["A", "C"])
Traceback (most recent call last):
...
TypeError: Some of the variables are not numerical. Please cast them as numerical before using this transformer.
"""
if len(df[columns].select_dtypes(exclude="number").columns) > 0:
raise TypeError(
"Some of the variables are not numerical. Please cast them as "
"numerical before using this transformer."
)
return columns
To test the docstrings, add these lines at the end of your module:
if __name__ == "__main__":
import doctest
doctest.testmod()
Now, when you run the script, doctest will verify all interactive examples:
$ python src/process_data.py
If there’s no output, all examples worked as expected.
For a detailed log of the tests, use the -v
flag:
$ python src/process_data.py -v
Trying:
import pandas as pd
Expecting nothing
ok
Trying:
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": ["a", "b", "c"]})
Expecting nothing
ok
Trying:
check_numerical_columns(df, ["A", "B"])
Expecting:
['A', 'B']
ok
Trying:
check_numerical_columns(df, ["A", "C"])
Expecting:
Traceback (most recent call last):
...
TypeError: Some of the variables are not numerical. Please cast them as numerical before using this transformer.
ok
This will show you exactly which parts of your examples are being tested and their outcomes.