Multiple Aggregation Prediction Algorithm (MAPA)
In this third post about modelling with Multiple Temporal Aggregation (MTA), I will explain how the Multiple Aggregation Prediction Algorithm (MAPA) works, which was the first incarnation of MTA for forecasting.
MAPA is quite simple in its logic:
- a time series is temporally aggregated into multiple levels, at each level strengthening and weakening various components of the time series, as discussed before;
- at each level an independent exponential smoothing (ETS) model is fit and its components are extracted;
- the ETS components are combined, using a few tricks (see the paper for details), to produce the final forecast, which borrows information from all levels.
In the original MAPA paper alternative combination approaches were trialed, where all temporal aggregation levels were given equal importance and combined through mean or median. For the seasonal component (which is the high frequency one), this causes an important issue: as the seasonality is filtered at the aggregate levels, it is effectively shrunk towards zero. Therefore, the combined seasonal component will be shrunk as well. Originally this was addressed by using a simple heuristic, combining the ETS forecast of the original series, with the MAPA forecast (hybrid approach). This effectively means that we use a weighted combination, where the first temporal aggregation level is given more weight than all other temporal aggregation levels together. Empirical evidence suggests that this re-weighting is beneficial.
The latter developed w.mean and w.median weight schemes attempt to do the same with variable weights across temporal aggregation levels for the seasonal component. In fact, when dealing with high frequency time series, it is always recommended to use these.
The interactive plot below illustrates how MAPA works. You can choose between various time series, the combination scheme for the components across temporal aggregation levels, and whether the hybrid forecast (effectively a re-weighting of the components) is used or not. The top plot shows the identified ETS models at each temporal aggregation level. Greyed cells indicate levels that no seasonality is estimated. These are levels that would require fractional seasonality, not permitted by conventional ETS. The second row of plots provides the forecasted ETS components across temporal aggregation levels, as well as the combined one (thick black line). The components of the first aggregation level are those of the ETS fitted at the original time series, and are plotted with a thicker line. For the seasonal component only levels that can be seasonal are used. Note the difference between the trajectories for the ETS and MAPA components, as well as the various components at different temporal aggregation levels. The bottom plot provides the forecasts of ETS and MAPA. The MAPA forecast is simply the addition of the three MAPA components. If Hybrid is used, the resulting MAPA forecast is the combination of the MAPA components and the ETS forecast (that is just equivalent to a specific weighting scheme of the components).
In most cases the level and trend components are different from the ETS ones, and the seasonality is always somewhat shrunk, depending on the combination weights scheme used.
Why does MAPA work? It captures low frequency components (trend) better, because of the temporal aggregation. Furthermore, it does not rely on a single ETS model that may or may not be well identified, therefore mitigating model uncertainty. I argue that MTA, as implemented in MAPA is a neat trick to extract more information from a time series… for free!
What are the limitations of MAPA? Quite a few, but there are two major ones: (i) the combinations weight schemes are ad-hoc, but there is strong evidence that equal weights are surely not the best solution; (ii) the identification of the ETS model for levels that any seasonality might be fractional is weakened by not considering that seasonality and letting it contaminate the level and slope components. Arguably, the use of ETS at its core is another potential limitation, although that is possible to lift.
In practice, MAPA was the first method after almost 15 years to improve upon the M3 competition results and since then there is mounting empirical evidence supporting its good forecasting performance, as well as extensions to incorporate explanatory variables and forecast intermittent demand time series. An interesting finding is that MAPA is very robust against misspecification, when compared to more conventional approaches that attempt to capitalise on a single (optimal) level of temporal aggregation. If you want to try it out, there is a package for R available (or on GitHub).
Ultimately the true contribution of MAPA was to demonstrate that the ideas behind MTA were sound and useful for forecasting! For this, the paper that introduced MAPA recently received the International Journal of Forecasting 2014-2015 best paper award; I am very humbled and happy for this!
Although the aforementioned limitations are not resolved, MAPA motivated research into Temporal Hierarchies that provide a more thorough foundation for using MTA in forecasting, overcoming many of MAPA’s issues, and enabling multiple avenues of future research. This will be the topic of a future post in the series. Till then, I will conclude by mentioning that MAPA in many applications still provides more accurate forecasts than Temporal Hierarchies, demonstrating that it is still an interesting research topic.
Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.
Pingback: R package: tsutils – Forecasting