In complex datasets, forecasts at detailed levels (e.g., regions, products) should align with higher-level forecasts (e.g., countries, categories). Inconsistent forecasts can lead to poor decisions.
Hierarchical forecasting ensures forecasts are consistent across all levels to reconcile and match forecasts from lower to higher levels.
HierarchicalForecast from Nixtla is an open-source library that provides tools and methods for creating and reconciling hierarchical forecasts
For illustrative purposes, consider a sales dataset with the following columns:
Country: The country where the sales occurred.
Region: The region within the country.
State: The state within the region.
Purpose: The purpose of the sale (e.g., Business, Leisure).
ds: The date of the sale.
y: The sales amount.
import numpy as np
import pandas as pd
Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'State', 'Region', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()
CountryStateRegionPurposedsyAustraliaSouth AustraliaAdelaideBusiness1998-01-01135.077690AustraliaSouth AustraliaAdelaideBusiness1998-04-01109.987316AustraliaSouth AustraliaAdelaideBusiness1998-07-01166.034687AustraliaSouth AustraliaAdelaideBusiness1998-10-01127.160464AustraliaSouth AustraliaAdelaideBusiness1999-01-01137.448533
The dataset can be grouped in the following non-strictly hierarchical structure:
Country
Country, State
Country, Purpose
Country, State, Region
Country, State, Purpose
Country, State, Region, Purpose
spec = [
['Country'],
['Country', 'State'],
['Country', 'Purpose'],
['Country', 'State', 'Region'],
['Country', 'State', 'Purpose'],
['Country', 'State', 'Region', 'Purpose']
]
Using the aggregate function from HierarchicalForecast we can get the full set of time series.
from hierarchicalforecast.utils import aggregate
Y_df, S_df, tags = aggregate(Y_df, spec)
Y_df = Y_df.reset_index()
Y_df.sample(10)
unique_iddsy12251Australia/New South Wales/Outback NSW/Business2000-10-0133131Australia/Western Australia/Australia’s North2000-10-0122034Australia/South Australia/Fleurieu Peninsula/Other2006-07-0131119Australia/Victoria/Phillip Island/Visiting2017-10-017671Australia/New South Wales/Other2015-10-0118339Australia/Queensland/Mackay/Business2002-10-0123043Australia/South Australia/Limestone Coast/Visiting1998-10-0122129Australia/South Australia/Fleurieu Peninsula/Visiting2010-04-0111349Australia/New South Wales/Hunter/Business2015-04-0116599Australia/Queensland/Brisbane/Other2007-10-01
Get all the distinct ‘Country/Purpose’ combinations present in the dataset:
tags['Country/Purpose']
array(['Australia/Business', 'Australia/Holiday', 'Australia/Other',
'Australia/Visiting'], dtype=object)
We use the final two years (8 quarters) as test set.
Y_test_df = Y_df.groupby('unique_id').tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')
Y_train_df.groupby('unique_id').size()
unique_idcountAustralia72Australia/ACT72Australia/ACT/Business72Australia/ACT/Canberra72Australia/ACT/Canberra/Business72……Australia/Western Australia/Experience Perth/Other72Australia/Western Australia/Experience Perth/Visiting72Australia/Western Australia/Holiday72Australia/Western Australia/Other72Australia/Western Australia/Visiting72
The following code generates base forecasts for each time series in Y_df using the ETS model. The forecasts and fitted values are stored in Y_hat_df and Y_fitted_df, respectively.
%%capture
from statsforecast.models import ETS
from statsforecast.core import StatsForecast
fcst = StatsForecast(df=Y_train_df,
models=[ETS(season_length=4, model='ZZA')],
freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(h=8, fitted=True)
Y_fitted_df = fcst.forecast_fitted_values()
Since Y_hat_df contains forecasts that are not coherent—meaning forecasts at detailed levels (e.g., by State, Region, Purpose) may not align with those at higher levels (e.g., by Country, State, Purpose)—we will use the HierarchicalReconciliation class with the BottomUp approach to ensure coherence.
from hierarchicalforecast.methods import BottomUp
from hierarchicalforecast.core import HierarchicalReconciliation
reconcilers = [BottomUp()]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df, S=S_df, tags=tags)
The dataframe Y_rec_df contains the reconciled forecasts.
Y_rec_df.head()
unique_iddsETSETS/BottomUpAustralia2016-01-0125990.06835924380.257812Australia2016-04-0124458.49023422902.765625Australia2016-07-0123974.05664122412.982422Australia2016-10-0124563.45507823127.439453Australia2017-01-0125990.06835924516.759766
Link to Hierarchical Forecast
What is the Bottom-Up Approach?
The bottom-up approach is a method where forecasts are initially created at the most granular level of a hierarchy and then aggregated up to higher levels. This approach ensures that detailed trends at lower levels are captured and accurately reflected in higher-level forecasts. It contrasts with top-down methods, which start with aggregate forecasts and distribute them downwards.
Steps in the Bottom-Up Approach
Forecast at the Lowest Level
First, forecasts are created at the most detailed level: Country, State, Region, Purpose. For example, the forecast for the next date might look like this:
CountryStateRegionPurposedsy_forecastUSANYEastBusiness2023-01-02105USANYEastLeisure2023-01-0285USANJEastBusiness2023-01-0295USANJEastLeisure2023-01-0275USACAWestBusiness2023-01-02125USACAWestLeisure2023-01-02115USANVWestBusiness2023-01-0265USANVWestLeisure2023-01-0255
Country, State, Purpose
Sum the forecasts for each Country, State, Purpose combination.
CountryStatePurposedsy_forecastUSANYBusiness2023-01-02105USANYLeisure2023-01-0285USANJBusiness2023-01-0295USANJLeisure2023-01-0275USACABusiness2023-01-02125USACALeisure2023-01-02115USANVBusiness2023-01-0265USANVLeisure2023-01-0255
Country, State, Region
Sum the forecasts for each Country, State, Region combination.
CountryStateRegiondsy_forecastUSANYEast2023-01-02190USANJEast2023-01-02170USACAWest2023-01-02240USANVWest2023-01-02120
Country, Purpose
Sum the forecasts for each Country, Purpose combination.
CountryPurposedsy_forecastUSABusiness2023-01-02390USALeisure2023-01-02330
Country
Sum the forecasts for the entire Country.
Countrydsy_forecastUSA2023-01-02720
Favorite