Over the last years I have been working (with my co-authors!) on the idea of Multiple Temporal Aggregation (MTA) for time series forecasting. A number of papers have been published introducing and developing the idea further, or testing its effectiveness for forecasting.
In this series of blog posts I will try to summarise the progress so far, and highlight ways that you can use it. This first post will summarise the papers so far and give an overview of the main findings. Later posts will focus on explaining how MTA works.
The key points behind MTA are the following:
- It is a radically different approach to time series modelling, recognising that the data sampling frequency may not be the best for a given modelling purpose.
- A time series is modelled simultaneously at multiple temporal aggregation levels that can be easily generated from the original data. At each level an appropriate model is fit, focusing on the components of the series that are strengthened by temporal aggregation.
- If forecasting is the objective, then the produced forecast reconciles the information from all these models. This makes the forecast robust to modelling uncertainty and lessens the importance of model selection.
- The resulting forecasts have been shown to be reliable and typically outperform the conventional modelling approach.
Table 1 summarises our contributions on MTA so far (follow the links to access the papers). We have also released two R packages that implement MTA: MAPA and thief. The former implements, as the name suggests, MAPA, while the latter provides code to use Temporal Hierarchies.
Paper | Summary |
---|---|
Kourentzes et al. 2014. Improving forecasting by estimating time series structural components across multiple frequencies. | The initial paper on MTA modelling. It introduces the Multiple Aggregation Prediction Algorithm (MAPA) and demonstrates its superior performance on the well-known M3 competition. |
Petropoulos and Kourentzes 2014. Forecast combinations for intermittent demand. | Expands MAPA for the case of intermittent demand. |
Kourentzes and Petropoulos 2016. Forecasting with multivariate temporal aggregation: The case of promotional modelling. | Expands MAPA for promotional modelling purposes at Stock Keeping Unit level. |
Barrow and Kourentzes 2016. Distributions of forecasting errors of forecast combinations: implications for inventory management. | Provides evidence of very strong performance of MAPA over established benchmarks for demand forecasting and inventory management purposes. |
Athanasopoulos et al. 2017. Forecasting with temporal hierarchies. | Introduces a general framework for MTA: Temporal Hierarchies that allows use of any model/method to produce forecasts at each level. |
Kourentzes et al. 2017. Demand forecasting by temporal aggregation: using optimal or multiple aggregation levels? | Demonstrates that MTA modelling is more robust to uncertainty than modelling either using the original data or using a single (optimal) temporal aggregation level. |
To give you an idea of the reported improvements, I have collated some of the results from the papers above. The best forecast in each column, in all tables, is highlighted in boldface. Table 2 provides a summary for the quarterly and monthly M3 datasets, using as benchmarks the Exponential Smoothing (ETS) family of models, with automatic model selection (via AICc), and Theta, the best performing method on the original M3 competition – a position it held for almost 15 years! In this case both MAPA and Temporal Hierarchies make use of the ETS family of models, so you can get a feeling of the improvement provided by MTA over conventional time series forecasting, as the results are directly comparable with the ETS row.
Tables 3 and 4 provide results for a number of real datasets. Table 4 also provides results on a variety of simulated ARIMA series. The detailed results can be found in the respective papers. In all cases MAPA is better, or at least as good, compared to the various benchmarks. Table 5 provides results on real series that have promoted periods. There are two comparisons: forecasts without and with promotional information. In both cases MTA based forecasts (MAPA) are on average the most accurate.
Forecast | Quarterly set | Monthly set |
---|---|---|
Exponential Smoothing (ETS) | 9.94% | 14.45% |
Theta (M3 competition)2 | 8.96% | 13.85% |
MAPA (Kourentzes et al. 2014) | 9.58% | 13.69% |
Temporal Hierarchies (Athanasopoulos et al. 2017) | 9.70% | 13.61% |
1 Papers provide results on more robust metrics!
2 Best performance in the original M3 competition.
Forecast | 1-step ahead | 3-steps ahead | 5-steps ahead |
---|---|---|---|
Naive | 0.882 | 0.900 | 0.919 |
ETS | 0.677 | 0.688 | 0.711 |
AR | 0.707 | 0.719 | 0.737 |
ARIMA | 1.446 | 0.701 | 0.721 |
Theta | 0.674 | 0.685 | 0.705 |
MAPA | 0.668 | 0.670 | 0.687 |
Forecast | Simulated ARIMA | Manaufacturing | Call centre |
---|---|---|---|
Single Exponential Smoothing (SES) | 1.000 | 1.000 | 1.000 |
Exponential Smoothing (ETS) | 0.985 | 1.011 | 1.005 |
Optimal Temporal Aggregation & SES | 0.974 | 0.999 | 1.080 |
MAPA | 0.971 | 0.994 | 0.979 |
Forecast | 4-steps ahead | 8-steps ahead | 12-steps ahead |
---|---|---|---|
Naive | 0.743 | 0.818 | 0.704 |
ETS | 0.704 | 0.774 | 0.701 |
MAPA | 0.679 | 0.754 | 0.736 |
Regression + Promotional | 0.611 | 0.659 | 0.714 |
ETS + Promotional | 0.642 | 0.627 | 0.543 |
MAPA + Promotional | 0.525 | 0.521 | 0.515 |
The main argument in all papers is that MTA helps to improve forecast accuracy due to the way it mitigates modelling uncertainty. As we will see this comes at no additional data cost and relatively limited additional computations. An added benefit, which is not very evident from the summarised tables provided here, is that the MTA forecasts are reliable both for short and long term forecasting, providing a way to reconcile operational, tactical and strategic planning.
Unpublished results on different applications provide a similar picture in terms of accuracy. There is also evidence that MTA can strengthen statistical tests, as the initial results of this experiment show. However, all this is ongoing research, so until a full analysis is conducted and the results are peer reviewed, I would add a pinch of salt to these!
In following blog posts I will explain how MTA works and elaborate more on results from the various papers.
Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.
Pingback: Members of INTERCOL receives best paper award International Journal of Forecasting 2014-2015 | INTERCOL
Pingback: Members of INTERCOL receive best paper award International Journal of Forecasting 2014-2015 | INTERCOL
Pingback: MAPAx example for R – Forecasting
Pingback: Towards the “one-number forecast” – Forecasting