For data scientists, to make your code reproducible, you need to put them into functions and classes, but the code may break because of some functions. Even if your code doesn’t break, how do you know if your function will work as you expected?
In general, you should use testing for your data science projects because it allows you to:
- Make sure the code works as expected
- Detect edge cases
- Feel confident to swap your existing code with improved code without being afraid of breaking the entire pipeline
There are many Python tools available for testing, but the easiest tool is pytest. I like pytest because it helps me to write tests with minimal code. If you were not familiar with testing, pytest is a great tool to get started.
In this article, I provided some simple examples and short explanations to get you started.