Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Testing

Simulate External Services in Testing with Mock Objects

Testing code that relies on external services, like a database, can be difficult since the behaviors of these services can change. 

A mock object can control the behavior of a real object in a testing environment by simulating responses from external services.

The code above uses a mock object to test the get_data function’s behavior when calling an API that may either succeed or fail.

Simulate External Services in Testing with Mock Objects Read More »

Pandera: Data Validation Made Simple for Python DataFrames

Poor data quality can lead to incorrect conclusions and bad model performance. Thus, it is important to check data for consistency and reliability before using it.

pandera makes it easy to perform data validation on dataframe-like objects. If the dataframe does not pass validation checks, pandera provides useful error messages.

Pandera: Data Validation Made Simple for Python DataFrames Read More »

Exploring Test Case Strategies: Individual Functions and Pytest Parameterize

To test the same function with multiple test cases, you can do either of the following:

Separate test functions:

This approach involves creating individual test functions for each test case.

def test_add_positive():
assert add(2, 3) == 5

def test_add_negative():
assert add(-2, -3) == -5

def test_add_mixed():
assert add(-2, 3) == 1

def test_add_zero():
assert add(0, 5) == 4

Output:

pytest_parametrize_example.py …F [100%]

=================================== FAILURES ===================================
________________________________ test_add_zero _________________________________

def test_add_zero():
> assert add(0, 5) == 4
E assert 5 == 4
E + where 5 = add(0, 5)

pytest_parametrize_example.py:14: AssertionError
=========================== short test summary info ============================
FAILED pytest_parametrize_example.py::test_add_zero – assert 5 == 4
========================= 1 failed, 3 passed in 0.05s ==========================

Pros:

Each test case is clearly isolated and easy to understand at a glance.

Cons:

Code duplication – the test structure is repeated for each case.

Adding new test cases requires writing a new function each time.

Changes to test structure only need to be made in multiple places.

Use pytest parameterize

This approach uses pytest’s parametrize decorator to run the same test function with different inputs.

import pytest

def add(num1, num2):
return num1 + num2

@pytest.mark.parametrize(
"a, b, expected",
[(2, 3, 5), (-2, -3, -5), (-2, 3, 1), (0, 5, 4)],
ids=["positive numbers", "negative numbers", "mixed signs", "zero and positive"],
)
def test_add(a, b, expected):
assert add(a, b) == expected

Output:

pytest_parametrize_example.py …F [100%]

=================================== FAILURES ===================================
_________________________ test_add[zero and positive] __________________________

a = 0, b = 5, expected = 4

@pytest.mark.parametrize(
"a, b, expected",
[(2, 3, 5), (-2, -3, -5), (-2, 3, 1), (0, 5, 4)],
ids=["positive numbers", "negative numbers", "mixed signs", "zero and positive"],
)
def test_add(a, b, expected):
> assert add(a, b) == expected
E assert 5 == 4
E + where 5 = add(0, 5)

pytest_parametrize_example.py:14: AssertionError
=========================== short test summary info ============================
FAILED pytest_parametrize_example.py::test_add[zero and positive] – assert 5 == 4
========================= 1 failed, 3 passed in 0.06s ==========================

Pros:

Easy to add new test cases by adding to the parameter list.

Changes to test structure only need to be made in one place.

Cons:

The purpose of each test case might be less immediately clear, especially for complex tests.

Choosing between these methods depends on your project’s needs. Use individual functions when clarity is crucial. Use parametrize when dealing with numerous similar cases.
Favorite

Exploring Test Case Strategies: Individual Functions and Pytest Parameterize Read More »

pytest-mock vs unittest.mock: Simplifying Mocking in Python Tests

Traditional mocking with unittest.mock often requires repetitive setup and teardown code, which can make test code harder to read and maintain.

pytest-mock addresses this issue by leveraging pytest’s fixture system, simplifying the mocking process and reducing boilerplate code.

Consider the following example that demonstrates the difference between unittest.mock and pytest-mock.

Using unittest.mock:

%%writefile test_rm_file.py
from unittest.mock import patch
import os

def rm_file(filename):
os.remove(filename)

def test_with_unittest_mock():
with patch("os.remove") as mock_remove:
rm_file("file")
mock_remove.assert_called_once_with("file")

Using pytest-mock:

%%writefile test_rm_file.py
import os

def rm_file(filename):
os.remove(filename)

def test_unix_fs(mocker):
mocker.patch("os.remove")
rm_file("file")
os.remove.assert_called_once_with("file")

Key differences:

Setup: pytest-mock uses the mocker fixture, automatically provided by pytest, eliminating the need to import patching utilities.

Patching: With pytest-mock, you simply call mocker.patch('os.remove'), whereas unittest.mock requires a context manager or decorator.

Cleanup: pytest-mock automatically undoes mocking after the test, while unittest.mock relies on the context manager for cleanup.

Accessing mocks: pytest-mock allows direct access to the patched function (e.g., os.remove.assert_called_once_with()), while unittest.mock requires accessing the mock through a variable (e.g., mock_remove.assert_called_once_with()).

Link to pytest-mock.
Favorite

pytest-mock vs unittest.mock: Simplifying Mocking in Python Tests Read More »

Mocking External Dependencies: Achieving Reliable Test Results

Testing code that relies on external services, like a database, can be difficult since the behaviors of these services can change.

A mock object can control the behavior of a real object in a testing environment by simulating responses from external services.

Here are two common use cases with examples:

Mocking Time-Dependent Functions

When testing functions that depend on the current time or date, you can mock the time to ensure consistent results.

Example: Testing a function that returns data for the last week

from datetime import datetime, timedelta

def get_data_for_last_week():
end_date = datetime.now().date()
start_date = end_date – timedelta(days=7)
return {
"start_date": start_date.strftime("%Y-%m-%d"),
"end_date": end_date.strftime("%Y-%m-%d"),
}

Now, let’s create a test for this function using mock:

from datetime import datetime
from unittest.mock import patch
from main import get_data_for_last_week

@patch("main.datetime")
def test_get_data_for_last_week(mock_datetime):
# Set a fixed date for the test
mock_datetime.now.return_value = datetime(2024, 8, 5)

# Call the function
result = get_data_for_last_week()

# Assert the results
assert result["start_date"] == "2024-07-29"
assert result["end_date"] == "2024-08-05"

# Verify that datetime.now() was called
mock_datetime.now.assert_called_once()

This test mocks the datetime.now() method to return a fixed date, allowing for predictable and consistent test results.

Mocking API calls

When testing code that makes external API calls, mocking helps avoid actual network requests during testing.

Example: Testing a function that makes an API call

import requests
from requests.exceptions import ConnectionError

def get_data():
"""Make an API call to Postgres"""
try:
response = requests.get("http://localhost:5432")
return response.json()
except ConnectionError:
return None

from unittest.mock import patch
from requests.exceptions import ConnectionError
from main import get_data

@patch("main.requests.get")
def test_get_data_fails(mock_get):
"""Test the get_data function when the API call fails"""
# Define what happens when the function is called
mock_get.side_effect = ConnectionError
assert get_data() is None

@patch("main.requests.get")
def test_get_data_succeeds(mock_get):
"""Test the get_data function when the API call succeeds"""
# Define the return value of the function
mock_get.return_value.json.return_value = {"data": "test"}
assert get_data() == {"data": "test"}

These tests mock the requests.get() function to simulate both successful and failed API calls, allowing us to test our function’s behavior in different scenarios without making actual network requests.

By using mocks in these ways, we can create more reliable and controlled unit tests for our data projects, ensuring that our code behaves correctly under various conditions.
Favorite

Mocking External Dependencies: Achieving Reliable Test Results Read More »

Backtesting: Assess Trading Strategy Performance Effortlessly in Python

Evaluating trading strategies’ effectiveness is crucial for financial decision-making, but it’s challenging due to the complexities of historical data analysis and strategy testing.

Backtesting allows users to simulate trades based on historical data and visualize the outcomes through interactive plots in three lines of code.

To see how Backtesting works, let’s create our first strategy to backtest on these Google data, a simple moving average (MA) cross-over strategy.

from backtesting.test import GOOG

GOOG.tail()

Open High Low Close Volume
2013-02-25 802.3 808.41 790.49 790.77 2303900
2013-02-26 795.0 795.95 784.40 790.13 2202500
2013-02-27 794.8 804.75 791.11 799.78 2026100
2013-02-28 801.1 806.99 801.03 801.20 2265800
2013-03-01 797.8 807.14 796.15 806.19 2175400

import pandas as pd

def SMA(values, n):
"""
Return simple moving average of `values`, at
each step taking into account `n` previous values.
"""
return pd.Series(values).rolling(n).mean()

from backtesting import Strategy
from backtesting.lib import crossover

class SmaCross(Strategy):
# Define the two MA lags as *class variables*
# for later optimization
n1 = 10
n2 = 20

def init(self):
# Precompute the two moving averages
self.sma1 = self.I(SMA, self.data.Close, self.n1)
self.sma2 = self.I(SMA, self.data.Close, self.n2)

def next(self):
# If sma1 crosses above sma2, close any existing
# short trades, and buy the asset
if crossover(self.sma1, self.sma2):
self.position.close()
self.buy()

# Else, if sma1 crosses below sma2, close any existing
# long trades, and sell the asset
elif crossover(self.sma2, self.sma1):
self.position.close()
self.sell()

To assess the performance of our investment strategy, we will instantiate a Backtest object, using Google stock data as our asset of interest and incorporating the SmaCross strategy class. We’ll start with an initial cash balance of 10,000 units and set the broker’s commission to a realistic rate of 0.2%.

from backtesting import Backtest

bt = Backtest(GOOG, SmaCross, cash=10_000, commission=.002)
stats = bt.run()
stats

Start 2004-08-19 00:00:00
End 2013-03-01 00:00:00
Duration 3116 days 00:00:00
Exposure Time [%] 97.067039
Equity Final [$] 68221.96986
Equity Peak [$] 68991.21986
Return [%] 582.219699
Buy & Hold Return [%] 703.458242
Return (Ann.) [%] 25.266427
Volatility (Ann.) [%] 38.383008
Sharpe Ratio 0.658271
Sortino Ratio 1.288779
Calmar Ratio 0.763748
Max. Drawdown [%] -33.082172
Avg. Drawdown [%] -5.581506
Max. Drawdown Duration 688 days 00:00:00
Avg. Drawdown Duration 41 days 00:00:00
# Trades 94
Win Rate [%] 54.255319
Best Trade [%] 57.11931
Worst Trade [%] -16.629898
Avg. Trade [%] 2.074326
Max. Trade Duration 121 days 00:00:00
Avg. Trade Duration 33 days 00:00:00
Profit Factor 2.190805
Expectancy [%] 2.606294
SQN 1.990216
_strategy SmaCross
_equity_curve …
_trades Size EntryB…
dtype: object

Plot the outcomes:

bt.plot()

Link to Backtesting.

Run in Google Colab.
Favorite

Backtesting: Assess Trading Strategy Performance Effortlessly in Python Read More »

Simplify Unit Testing of SQL Queries with PySpark

Testing your SQL queries helps to ensure that they are correct and functioning as intended.

PySpark enables users to parameterize queries, which simplifies unit testing of SQL queries. In this example, the df and amount variables are parameterized to verify whether the actual_df matches the expected_df.

Learn more about parameterized queries in PySpark.
Favorite

Simplify Unit Testing of SQL Queries with PySpark Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran