The talk has three parts:

- Argue (as others before me!) that model based uncertainty (the sigma we get from our models) is not the full story, and estimation/model uncertainty should be accounted in prediction intervals and decision making. Key point: most model outputs assume that the model itself is true, which is… not true!
- Provide initial results from an approach to directly account for model selection uncertainty that leads is improvements in model selection, but also leads naturally to model combination, aswering what and when to combine.
- Demonstrate how work in Multiple Temporal Aggregation is an effective way at addressing modelling uncertainty, summarising research up to this point.

You can download the talk here.

Abstract:

Forecasts are central to decision making. Over the last decades there have been substantial innovations in business forecasting, resulting in increased accuracy of forecasts. Models and modelling principles have matured to address company problems in a realistic sense, i.e. they are aware of the requirements and limitations of practice; and tested empirically to demonstrate their effectiveness. Furthermore, there has been a shift in recognising the importance of having models instead of methods to facilitate parameterisation, model selection and the generation of prediction intervals. The latter has been instrumental in refocusing from point forecasts to prediction intervals, which reflect the relevant risk for the decisions supported by the forecasts. At the same time the quality and quantity of potential model inputs has increased exponentially, permitting models to use more information sources and support higher frequency of decision making, such as daily and weekly planning cycles. All these have facilitated and made necessary an increase in automation of the forecasting process, bringing to the forefront a new dimension of uncertainty: the model selection and specification uncertainty. The uncertainty captured in the prediction intervals assumes that the selected model is `true’. This is hardly the case in practice and we should account for that additional uncertainty. First, we discuss the uncertainties implied in model selection and specification. Then we proceed to develop a way to measure this uncertainty and derive a new way to perform model selection. We demonstrate that that this not only leads to superior selection, but also provides a natural link to model combination and specifying the relevant pool of models. Last, we demonstrate that once we recognise the uncertainty in model specification, we can extract more information from our data by using the multiple temporal aggregation frameworks, and empirically show the achieved increase in forecast accuracy and reliability.

]]>Sales data often only represents a part of the demand for a service product owing to constraints such as capacity or booking limits. Unconstraining methods are concerned with estimating the true demand from such constrained sales data. This paper addresses the frequently encountered situation of observing only a few sales events at the individual product level and proposes variants of small demand forecasting methods to be used for unconstraining. The usual procedure is to aggregate data; however, in that case we lose information on when restrictions were imposed or lifted within a given booking profile. Our proposed methods exploit this information and are able to approximate convex, concave or homogeneous booking curves. Furthermore, they are numerically robust due to our proposed group-based parameter optimization. Empirical results on accuracy and revenue performance based on data from a major car rental company indicate revenue improvements over a best practice benchmark by statistically significant 0.5%-1.4% in typical scenarios.

Download paper.

]]>I should start by saying that the development team of Prophet suggests that its strengths are:

- high-frequency data (hourly, daily, or weekly) with multiple seasonalities, such as day of week and time of year;
- special events and bank holidays that are not fixed in the year;
- in the presence of missing values or large outliers;
- changes in the historical trends, which themselves are non-linear growth curves.

The M3 dataset has multiple series of micro/business interest and as a recent presentation by E. Spiliotis et al. at ISF2017 (slides 11-12) indicated, the characteristics of the time series overlap with typical business time series, albeit not high frequency. However, a lot of business forecasting is still not hourly or daily, so not including high frequency examples for many business forecasters is not necessarily an issue when benchmarking Prophet.

The setup of the experiment is:

- Use Mean Absolute Scaled Error (MASE). I chose this measure as it has good statistical properties and has become quite common in forecasting research.
- Use rolling origin evaluation, so as ensure that the reported figures are robust against particularly lucky (or unlucky) forecast origins and test sets.
- Use the forecast horizons and test sets indicated in Table 1, for each M3 subset.

Set | No. of series | Horizon | Test set |
---|---|---|---|

Yearly | 645 | 4 | 8 |

Quarterly | 756 | 4 | 8 |

Monthly | 1428 | 12 | 18 |

Other | 174 | 12 | 18 |

I used a number of benchmarks from some existing packages in R, namely:

- forecast package, from which I used the exponential smoothing (ets) and ARIMA (auto.arima) functions. Anybody doing forecasting in R is familiar with this package! ETS and ARIMA over the years have been shown to be very strong benchmarks for business forecasting tasks and specifically for the M3 dataset.
- smooth package. This is a less known package that offers alternative implementations of exponential smoothing (es) and ARIMA (auto.ssarima), which follow a different modelling philoshopy than the forecast package equivalents. If you are interested, head over to Ivan’s blog to read more about these (and other nice blog posts). forecast and smooth packages used together offer a tremendous flexibiltiy in ETS and ARIMA modelling.
- MAPA and thief packages, which both implement Multiple Temporal Aggregation (MTA) for forecasting, following to alternative approaches that I detail here (for MAPA) and here (for THieF). I included these as they have been shown to perform quite well on such tasks.

The idea here is to give Prophet a hard time, but also avoid using too exotic forecasting methods.

I provide the mean and median MASE across all forecast origins and series for each subset in tables 2 and 3 respectively. In brackets I provide the percentage difference from the ETS’ accuracy. In boldface I have highlight the best forecast for each M3 subset. Prophet results are in blue. I provide two MAPA results, the first uses the default options, whereas the second uses comb=”w.mean” that is more mindful of seasonality. For THieF I only provide the default result (using ETS), as in principle it could be applied to any forecast on the table.

Set | ETS | ARIMA | ES (smooth) | SSARIMA (smooth) | MAPA | MAPA (w.mean) | THieF (ETS) | Prophet |
---|---|---|---|---|---|---|---|---|

Yearly | 0.732 (0.00%) |
0.746 (-1.91%) | 0.777 (-6.15%) | 0.783 (-6.97%) | 0.732 (0.00%) | 0.732 (0.00%) | 0.732 (0.00%) | 0.954 (-30.33%) |

Quarterly | 0.383 (0.00%) |
0.389 (-1.57%) | 0.385 (-0.52%) | 0.412 (-7.57%) | 0.386 (-0.78%) | 0.384 (-0.26%) | 0.400 (-4.44%) | 0.553 (-44.39%) |

Monthly | 0.464 (0.00%) | 0.472 (-1.72%) | 0.465 (-0.22%) | 0.490 (-5.60%) | 0.459 (+1.08%) | 0.458 (+1.29%) |
0.462 (+0.43%) | 0.586 (-26.29%) |

Other | 0.447 (0.00%) | 0.460 (-2.91%) | 0.446 (+0.22%) | 0.457 (-2.24%) | 0.444 (+0.67%) |
0.444 (+0.67%) |
0.447 (0.00%) | 0.554 (-23.94%) |

Set | ETS | ARIMA | ES (smooth) | SSARIMA (smooth) | MAPA | MAPA (w.mean) | THieF (ETS) | Prophet |
---|---|---|---|---|---|---|---|---|

Yearly | 0.514 (0.00%) | 0.519 (-0.97%) | 0.511 (+0.58%) |
0.524 (-1.95%) | 0.520 (-1.17%) | 0.520 (-1.17%) | 0.514 (0.00%) | 0.710 (-38.13%) |

Quarterly | 0.269 (0.00%) | 0.266 (+1.12%) | 0.256 (+4.83%) | 0.278 (-3.35%) | 0.254 (+5.58%) |
0.254 (+5.58%) |
0.262 (+2.60%) | 0.388 (-44.24%) |

Monthly | 0.353 (0.00%) | 0.348 (+1.42%) |
0.351 (+0.57%) | 0.373 (-5.67%) | 0.352 (+0.28%) | 0.351 (+0.57%) | 0.351 (+0.57%) | 0.473 (-33.99%) |

Other | 0.275 (0.00%) | 0.269 (+2.18%) | 0.270 (+1.82%) | 0.268 (+2.55%) |
0.283 (-2.91%) | 0.283 (-2.91%) | 0.275 (0.00%) | 0.320 (-16.36%) |

Some comments about the results:

- Prophet performs very poorly. The dataset does not contain multiple seasonalities, but it does contain human-activity based seasonal patters (quarterly and monthly series), changing trends and outliers or other abrupt changes (especially the `other’ subset), where Prophet should do ok. My concern is not that it is not ranking first, but that at best it is almost 16% worse than exponential smoothing (and at worst almost 44%!);
- ETS and ARIMA between packages perform reasonably similar, indicating that although there are implementation differences, both packages have followed sound modelling philoshopies;
- MAPA and THieF are meant to work on the quarterly and monthly subsets, where, in line with the research, they improve upon their base model (ETS).

In all fairness, more testing is needed on high frequency data with multiple seasonalities before one should conclude about the performance of Prophet. Nonetheless. for the vast majority of business forecasting needs (such as supply chain forecasting), Prophet does not seem to perform that well. As a final note, this is an open source project, so I am expecting over time to see interesting improvements.

Finally, I want to thank Oliver Schaer for providing me with Prophet R code examples! You can also find some examples here.

]]>In the previous post we saw how the Multiple Aggregation Prediction Algortihm (MAPA) implements the ideas of MTA. We also saw that it has some limitations, particularly requiring splitting forecasts into subcomponents (level, trend and seasonality). Although some forecasting methods provide such outputs naturally, for example Exponential Smoothing and Theta, others do not. More crucially, manually adjusted forecasts do not either, and even though it is possible to use MAPAx for that, a simpler approach would be welcome. This is where Temporal Hierarchies become quite useful, which is an alternative way to implement MTA.

Temporal Hierarchies borrow many ideas from cross-section hierarchies and organise the different temporal aggregation levels as a hierarchy. Consider for example four quarterly observations. The first two quarters constitute the first half-year, and the last two quarters constitute the second half-year. The two half-years add up to make a complete year. These connections imply a hierarchy, much like sales of different packet sizes of a product in a supermarket can be organised in a product hierarchy. However, temporal hierarchies have one key advantage over cross-sectional ones, they are uniquely specified by the problem at hand. Suppose I am given monthly data to forecast. There is a single hierarchy across temporal aggregation levels, much like in the quarterly example before, that I need to deal with, irrespective of the item I need to forecast, the way I got the forecast or the properties of the time series. Once this unique hierarchy is defined (and all the data are coming from temporally aggregate views of the original time series), then all that is left is to do is to forecast across the hierarchy, i.e., all temporal aggregation levels and reconcile the forecasts. The act of reconciliation brings together information from all modelling levels, with the MTA benefits discussed in the previous posts.

Some hierarchies are more complex than others. The quarterly hierarhcy, from the example above, is a very simple three level hierarchy (quarters, half-years, years). A monthly hierarchy is more complex, because there are more than one ways to reach to yearly data from monthly. For example, one could aggregate by 2 months, then these by 2 (4-monthly level), and then that by 3 (yearly level). Alternatively, one could aggregate to quarterly data, half-yearly and then yearly. The two aggregation paths can happen in parallel. The temporal hierarchy is made up by all possible paths. Note that in constrast to MAPA, levels that do not fully add up to a yearly time series are excluded (intuitively they do not belong in any path from the bottom dissagregate level to the top yearly level). This has the advantage that any forecasting model/method does not need to deal with series that may have fractional seasonality. Nonetheless, this is an interesting future research avenue.

The following interactive plot provides the temporal hierarchies for common types of time series. Observe that many have multiple pathways to the top yearly level (for example, monthly time series), and some are very simple hierarchies (for example, days in week). Use the highlight option to easily visualise the various pathways. Once visualised, the analogies with cross-sectional hierarchies are apparent.

To forecast we need to populate every level of the hierarchy with a forecast. So for example, for the quarterly hierarchy we need to provide 3 sets of forecasts, one for the quarterly time series, one for the semi-yearly and one for the yearly. Imagine that each hierarchy depicts one year’s worth of forecasts, but obviously we can produce the same hierarchy for the next year and so on. Mathematically this is just another column of forecasts to be handled by the hierarchy, so in fact it is trivial to do. But an implication is that forecasts are produced in horizons that are multiples of full years (and then any shorter horizons are used accordingly). People are more familiar with two specific cases of temporal hierarchies. One is when we need to produce a total figure over a period, for example for tactical/strategic forecasts. This is simply the bottom-up interpretation of temporal hierarchies: forecasts from the lowest level are summed to a higher level. The other alternative is to produce a forecast and then use a `profile’ to split this further. In supply chain forecasting and call centres this is very common, in breaking weekly forecasts into daily profiles, or daily forecasts into intra-daily profiles. This is merely the top-down interpretation of temporal hierarchies.

**Forecasting with Temporal Hierarchies**

You may have already noticed that there is nothing to restrict the source of forecasts. They can be based on some statistical model, judgement, mix of both, differ amongst levels, or whatever other exotic source. This is a substantial advantage over MAPA, and temporal hierarchies provide a flexible MTA foundation. In reconciling the forecast there are couple of complications that we deal with in this paper (the scale and variance of the forecasts are different, which needs to be taken into account during reconciliation). I mentioned earlier that temporal hierarchies are unique. This simplifies substantially the solution, but I will not go into the mathematical details here.

In the following interactive plot you can choose from the usual time series I have been using as examples in this series of posts to produce base (conventional built forecast from a single level, in red) and Temporal Hierarchy Forecasts (THieF, in blue). I provide the forecasts across the various temporal aggregation levels permitted by the hierarchy. Observe how the information across the temporal aggregation levels is shared in the THieF forecasts to achieve better modelling of the series. You can also choose between three different forecasts: exponential smoothing, ARIMA and naive. The naive forecasts are quite illuminating in showing how the multiple views offered by THieF achieve supperior results. There other two types of forecasts are quite illustrative as well.

I also provide Mean Absolute Error (MAE) for the base and THeiF forecasts for the dissagregate series. You will observe that on average THieF forecasts are more accurate. The gains improve at more aggregate levels. In the paper we demonstrate with simulations that in various scenarios of uncertainty (parameter, model) THieF performs better or at least as good as base forecasts.

To sum up, forecasting with temporal hierarchies:

- offers a very flexible framework to implement MTA, with all its advantages;
- is independent of source of forecasts, allowing to provide different additional information at different levels, if available;
- has been shown to offer substantial gains in terms of accuracy over base forecasts, by blending the information available across temporal aggregation levels;
- provides reconciled short term (dissaggregate) and long-term (aggregate) forecasts, leading to aligned operational, tactical and strategic planning.

If you want to try it out we have released the thief package for R.

A final note on THieF. THieF and MAPA both perform very well and neither is a clear winner in terms of forecast accuracy alone. The two MTA alternatives handle information in a different way. MAPA also takes advantage of the `in-between’ levels that THieF excludes. The good performance of both, even though they have some key differences, is exciting: it gives further merit to MTA and offers some clear directions for future work!

Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.

]]>**Abstract**

The four major Scandinavian economies (Denmark, Finland, Sweden and Norway) have high workforce mobility and depending on market dynamics the unemployment in one country can be influenced by conditions in the neighbouring ones. We provide evidence that Vector Autoregressive modelling of unemployment between the four countries produces more accurate predictions than constructing independent forecasting models. However, given the dimensionality of the VAR model its specification and estimation can become challenging, particularly when modelling unemployment across multiple factors. To overcome this we consider the hierarchical structure of unemployment in Scandinavia, looking at three dimensions: age, country and gender. This allows us to construct multiple complimentary hierarchies, aggregating across each dimension. The resulting grouped hierarchy enforces a well-defined structure to the forecasting problem. By producing forecasts across the hierarchy, under the restriction that they are reconciled across the hierarchical structure, we provide an alternative way to establish connections between the time series that describe the four countries. We demonstrate that this approach is not only competitive with VAR modelling, but as each series is modelled independently, we can easily employ advanced forecasting models, in which case independent and VAR forecasts are substantially outperformed. Our results illustrate that there are three useful alternatives to model connections between series, directly through multivariate vector models, through the covariance of the prediction errors across a hierarchy of series, and through the implicit restrictions enforced by the hierarchical structure. We provide evidence of the performance of each, as well as their combination.

]]>**Abstract**

In this paper we explore how judgment can be used to improve model selection for forecasting.We benchmark the performance of judgmental model selection against the statistical one, based on information criteria. Apart from the simple model choice approach, we also examine the efficacy of a judgmental model build approach, where experts are asked to decide on the existence of the structural components (trend and seasonality) of the time series. The sample consists of almost 700 participants that contributed in a custom-designed laboratory experiment. The results suggest that humans perform model selection differently than statistics. When forecasting performance is assessed, individual judgmental model selection performs equally if not better to statistical model selection. Simple combination of the statistical and judgmental selections and judgmental aggregation significantly outperform both statistical and judgmental selection.

]]>**Abstract**

With thousands of call centres worldwide employing millions and serving billions of customers as a first point of contact, accurate scheduling and capacity planning of resources is important. Forecasts are required as inputs for such scheduling and planning in the short medium and long-term. Current approaches involve forecasting weekly demand and subsequent disaggregation into half-hourly, hourly and daily time buckets as forecast are required to support multiple decisions and plans. Once the weekly call volume forecasts are prepared, accounting for any seasonal variations, they are broken down into high frequencies using appropriate proportions that mainly capture the intra-week and intra-day seasonality. Although this ensures reconciled forecasts across all levels, and therefore aligned decision making, it is potentially not optimal in terms of forecasting. On the other hand, producing forecasts at the highest available frequency, and aggregating to lower frequencies, may also not be ideal as very long lead-time forecasts may be required. A third option, which is more appropriate from a forecasting standpoint, is to produce forecasts at different levels using appropriate models for each. Although this has the potential to generate good forecasts, in terms of decision making the forecasts are not aligned, which may cause organisational problems. Recently, Kourentzes et al. (2014) proposed the Multiple Aggregation Prediction Algorithm (MAPA), where forecasting with multiple temporal aggregation (MTA) levels allows both accurate and reconciled forecasts. The main idea of MTA is to model a series at multiple aggregation levels separately, taking advantage of the information that is highlighted at each level, and subsequently combine the forecasts by using the implied temporal hierarchical structure. Athanasopoulos et al. (2017) proposed a more general MTA framework than MAPA, defining appropriate temporal hierarchies and reconciliation mechanisms, and thus providing a MTA forecasting framework that is very flexible and model independent, while retaining all the benefits of MAPA. Given the high frequency, multi-temporal nature of the forecast requirements and the subsequent planning associated with call centre arrival forecasting, MTA becomes a natural, but yet unexplored candidate for call centre forecasting. This work evaluates whether there are any benefits from temporal aggregation both at the level of decision making as well as at the level of aggregation in terms of forecast accuracy and operational efficiency. In doing so, various methods of disaggregation are considered when the decision level and the forecasting level differ, including methods which results in reconciled and unreconciled forecasts. The findings of this study will contribute to call centre management practice by proposing best approaches for forecasting call centre data at the various decision levels taking into account accuracy and operational efficiency, but will also contribute to research on the use of temporal hierarchies in the area of high frequency time series data.

]]>In this third post about modelling with Multiple Temporal Aggregation (MTA), I will explain how the Multiple Aggregation Prediction Algorithm (MAPA) works, which was the first incarnation of MTA for forecasting.

MAPA is quite simple in its logic:

- a time series is temporally aggregated into multiple levels, at each level strengthening and weakening various components of the time series, as discussed before;
- at each level an independent exponential smoothing (ETS) model is fit and its components are extracted;
- the ETS components are combined, using a few tricks (see the paper for details), to produce the final forecast, which borrows information from all levels.

In the original MAPA paper alternative combination approaches were trialed, where all temporal aggregation levels were given equal importance and combined through mean or median. For the seasonal component (which is the high frequency one), this causes an important issue: as the seasonality is filtered at the aggregate levels, it is effectively shrunk towards zero. Therefore, the combined seasonal component will be shrunk as well. Originally this was addressed by using a simple heuristic, combining the ETS forecast of the original series, with the MAPA forecast (hybrid approach). This effectively means that we use a weighted combination, where the first temporal aggregation level is given more weight than all other temporal aggregation levels together. Empirical evidence suggests that this re-weighting is beneficial.

The latter developed w.mean and w.median weight schemes attempt to do the same with variable weights across temporal aggregation levels for the seasonal component. In fact, when dealing with high frequency time series, it is always recommended to use these.

The interactive plot below illustrates how MAPA works. You can choose between various time series, the combination scheme for the components across temporal aggregation levels, and whether the hybrid forecast (effectively a re-weighting of the components) is used or not. The top plot shows the identified ETS models at each temporal aggregation level. Greyed cells indicate levels that no seasonality is estimated. These are levels that would require fractional seasonality, not permitted by conventional ETS. The second row of plots provides the forecasted ETS components across temporal aggregation levels, as well as the combined one (thick black line). The components of the first aggregation level are those of the ETS fitted at the original time series, and are plotted with a thicker line. For the seasonal component only levels that can be seasonal are used. Note the difference between the trajectories for the ETS and MAPA components, as well as the various components at different temporal aggregation levels. The bottom plot provides the forecasts of ETS and MAPA. The MAPA forecast is simply the addition of the three MAPA components. If Hybrid is used, the resulting MAPA forecast is the combination of the MAPA components and the ETS forecast (that is just equivalent to a specific weighting scheme of the components).

In most cases the level and trend components are different from the ETS ones, and the seasonality is always somewhat shrunk, depending on the combination weights scheme used.

Why does MAPA work? It captures low frequency components (trend) better, because of the temporal aggregation. Furthermore, it does not rely on a single ETS model that may or may not be well identified, therefore mitigating model uncertainty. I argue that MTA, as implemented in MAPA is a neat trick to extract more information from a time series… for free!

What are the limitations of MAPA? Quite a few, but there are two major ones: (i) the combinations weight schemes are ad-hoc, but there is strong evidence that equal weights are surely not the best solution; (ii) the identification of the ETS model for levels that any seasonality might be fractional is weakened by not considering that seasonality and letting it contaminate the level and slope components. Arguably, the use of ETS at its core is another potential limitation, although that is possible to lift.

In practice, MAPA was the first method after almost 15 years to improve upon the M3 competition results and since then there is mounting empirical evidence supporting its good forecasting performance, as well as extensions to incorporate explanatory variables and forecast intermittent demand time series. An interesting finding is that MAPA is very robust against misspecification, when compared to more conventional approaches that attempt to capitalise on a single (optimal) level of temporal aggregation. If you want to try it out, there is a package for R available (or on GitHub).

Ultimately the true contribution of MAPA was to demonstrate that the ideas behind MTA were sound and useful for forecasting! For this, the paper that introduced MAPA recently received the International Journal of Forecasting 2014-2015 best paper award; I am very humbled and happy for this!

Although the aforementioned limitations are not resolved, MAPA motivated research into Temporal Hierarchies that provide a more thorough foundation for using MTA in forecasting, overcoming many of MAPA’s issues, and enabling multiple avenues of future research. This will be the topic of a future post in the series. Till then, I will conclude by mentioning that MAPA in many applications still provides more accurate forecasts than Temporal Hierarchies, demonstrating that it is still an interesting research topic.

Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.

]]>This paper proposes the use of Multiple Temporal Aggregation approach that I have been posting about, and introduces the MAPA forecasting method, for which there is an R package. The other shortlisted papers were of very good quality and I am humbled by the choice of the editorial board. My thanks to my co-authors and the reviewers, who made this paper possible.

]]>- Time series exploration
- Univariate (extrapolative) forecasting
- Intermittent demand series forecasting
- Forecasting with regression
- Special topics: (i) Hierarchical forecasting; (ii) ABC-XYZ analysis; and (iii) LASSO regression

Material:

- Workshop notes: these provide code examples with comments. You will also find some references for the various methods used in the workshop.
- Workshop slides: these provide an
**extremely brief**overview of some of the methods used and their implementation. - Workshop R solution scripts: these replicate the examples in the notes.
- Workshop data: these are needed to replicate the examples in the notes and scripts.

The notes are aimed at researchers and experienced practitioners, who are comfortable with the theory behind the various models and methods. I hope you find this material useful!

A couple of more packages to explore:

- thief: A package that implement forecasting with temporal hierarchies
- smooth: A package that provides alternative implementations of exponential smoothing, ARIMA and other exciting forecasting models.