Author Archives: Nikos

Benchmarking Facebook’s Prophet

Last February Facebook open sourced its Prophet forecasting tool. Since, it had appeared in quite a few discussions online. A good thing about Prophet is that it one can use it very easily through R (and Python). This gave me the opportunity to benchmark it against some more standard – and not! – forecasting models and methods. To do this I tried it on the M3 competition dataset (available through the Mcomp package for R).

I should start by saying that the development team of Prophet suggests that its strengths are:

  • high-frequency data (hourly, daily, or weekly) with multiple seasonalities, such as day of week and time of year;
  • special events and bank holidays that are not fixed in the year;
  • in the presence of missing values or large outliers;
  • changes in the historical trends, which themselves are non-linear growth curves.

The M3 dataset has multiple series of micro/business interest and as a recent presentation by E. Spiliotis et al. at ISF2017 (slides 11-12) indicated, the characteristics of the time series overlap with typical business time series, albeit not high frequency. However, a lot of business forecasting is still not hourly or daily, so not including high frequency examples for many business forecasters is not necessarily an issue when benchmarking Prophet.

The setup of the experiment is:

  • Use Mean Absolute Scaled Error (MASE). I chose this measure as it has good statistical properties and has become quite common in forecasting research.
  • Use rolling origin evaluation, so as ensure that the reported figures are robust against particularly lucky (or unlucky) forecast origins and test sets.
  • Use the forecast horizons and test sets indicated in Table 1, for each M3 subset.
Table 1. M3 dataset
Set No. of series Horizon Test set
Yearly 645 4 8
Quarterly 756 4 8
Monthly 1428 12 18
Other 174 12 18

I used a number of benchmarks from some existing packages in R, namely:

  • forecast package, from which I used the exponential smoothing (ets) and ARIMA (auto.arima) functions. Anybody doing forecasting in R is familiar with this package! ETS and ARIMA over the years have been shown to be very strong benchmarks for business forecasting tasks and specifically for the M3 dataset.
  • smooth package. This is a less known package that offers alternative implementations of exponential smoothing (es) and ARIMA (auto.ssarima), which follow a different modelling philoshopy than the forecast package equivalents. If you are interested, head over to Ivan’s blog to read more about these (and other nice blog posts).  forecast and smooth packages used together offer a tremendous flexibiltiy in ETS and ARIMA modelling.
  • MAPA and thief packages, which both implement Multiple Temporal Aggregation (MTA) for forecasting, following to alternative approaches that I detail here (for MAPA) and here (for THieF). I included these as they have been shown to perform quite well on such tasks.

The idea here is to give Prophet a hard time, but also avoid using too exotic forecasting methods.

I provide the mean and median MASE across all forecast origins and series for each subset in tables 2 and 3 respectively. In brackets I provide the percentage difference from the ETS’ accuracy. In boldface I have highlight the best forecast for each M3 subset. Prophet results are in blue. I provide two MAPA results, the first uses the default options, whereas the second uses comb=”w.mean” that is more mindful of seasonality. For THieF I only provide the default result (using ETS), as in principle it could be applied to any forecast on the table.

Table 2. Mean MASE results
Set ETS ARIMA ES (smooth) SSARIMA (smooth) MAPA MAPA (w.mean) THieF (ETS) Prophet
Yearly 0.732 (0.00%) 0.746 (-1.91%) 0.777 (-6.15%) 0.783 (-6.97%) 0.732 (0.00%) 0.732 (0.00%) 0.732 (0.00%) 0.954 (-30.33%)
Quarterly 0.383 (0.00%) 0.389 (-1.57%) 0.385 (-0.52%) 0.412 (-7.57%) 0.386 (-0.78%) 0.384 (-0.26%) 0.400 (-4.44%) 0.553 (-44.39%)
Monthly 0.464 (0.00%) 0.472 (-1.72%) 0.465 (-0.22%) 0.490 (-5.60%) 0.459 (+1.08%) 0.458 (+1.29%) 0.462 (+0.43%) 0.586 (-26.29%)
Other 0.447 (0.00%) 0.460 (-2.91%) 0.446 (+0.22%) 0.457 (-2.24%) 0.444 (+0.67%) 0.444 (+0.67%) 0.447 (0.00%) 0.554 (-23.94%)

Table 3. Median MASE results
Set ETS ARIMA ES (smooth) SSARIMA (smooth) MAPA MAPA (w.mean) THieF (ETS) Prophet
Yearly 0.514 (0.00%) 0.519 (-0.97%) 0.511 (+0.58%) 0.524 (-1.95%) 0.520 (-1.17%) 0.520 (-1.17%) 0.514 (0.00%) 0.710 (-38.13%)
Quarterly 0.269 (0.00%) 0.266 (+1.12%) 0.256 (+4.83%) 0.278 (-3.35%) 0.254 (+5.58%) 0.254 (+5.58%) 0.262 (+2.60%) 0.388 (-44.24%)
Monthly 0.353 (0.00%) 0.348 (+1.42%) 0.351 (+0.57%) 0.373 (-5.67%) 0.352 (+0.28%) 0.351 (+0.57%) 0.351 (+0.57%) 0.473 (-33.99%)
Other 0.275 (0.00%) 0.269 (+2.18%) 0.270 (+1.82%) 0.268 (+2.55%) 0.283 (-2.91%) 0.283 (-2.91%) 0.275 (0.00%) 0.320 (-16.36%)

Some comments about the results:

  • Prophet performs very poorly. The dataset does not contain multiple seasonalities, but it does contain human-activity based seasonal patters (quarterly and monthly series), changing trends and outliers or other abrupt changes (especially the `other’ subset), where Prophet should do ok. My concern is not that it is not ranking first, but that at best it is almost 16% worse than exponential smoothing (and at worst almost 44%!);
  • ETS and ARIMA between packages perform reasonably similar, indicating that although there are implementation differences, both packages have followed sound modelling philoshopies;
  • MAPA and THieF are meant to work on the quarterly and monthly subsets, where, in line with the research, they improve upon their base model (ETS).

In all fairness, more testing is needed on high frequency data with multiple seasonalities before one should conclude about the performance of Prophet. Nonetheless. for the vast majority of business forecasting needs (such as supply chain forecasting), Prophet does not seem to perform that well. As a final note, this is an open source project, so I am expecting over time to see interesting improvements.

Finally, I want to thank Oliver Schaer for providing me with Prophet R code examples! You can also find some examples here.

Multiple temporal aggregation: the story so far. Part IV: Temporal Hierarchies

Temporal Hierarchies

In the previous post we saw how the Multiple Aggregation Prediction Algortihm (MAPA) implements the ideas of MTA. We also saw that it has some limitations, particularly requiring splitting forecasts into subcomponents (level, trend and seasonality). Although some forecasting methods provide such outputs naturally, for example Exponential Smoothing and Theta, others do not. More crucially, manually adjusted forecasts do not either, and even though it is possible to use MAPAx for that, a simpler approach would be welcome. This is where Temporal Hierarchies become quite useful, which is an alternative way to implement MTA.

Temporal Hierarchies borrow many ideas from cross-section hierarchies and organise the different temporal aggregation levels as a hierarchy. Consider for example four quarterly observations. The first two quarters constitute the first half-year, and the last two quarters constitute the second half-year. The two half-years add up to make a complete year. These connections imply a hierarchy, much like sales of different packet sizes of a product in a supermarket can be organised in a product hierarchy. However, temporal hierarchies have one key advantage over cross-sectional ones, they are uniquely specified by the problem at hand. Suppose I am given monthly data to forecast. There is a single hierarchy across temporal aggregation levels, much like in the quarterly example before, that I need to deal with, irrespective of the item I need to forecast, the way I got the forecast or the properties of the time series. Once this unique hierarchy is defined (and all the data are coming from temporally aggregate views of the original time series), then all that is left is to do is to forecast across the hierarchy, i.e., all temporal aggregation levels and reconcile the forecasts. The act of reconciliation brings together information from all modelling levels, with the MTA benefits discussed in the previous posts.

Some hierarchies are more complex than others. The quarterly hierarhcy, from the example above, is a very simple three level hierarchy (quarters, half-years, years). A monthly hierarchy is more complex, because there are more than one ways to reach to yearly data from monthly. For example, one could aggregate by 2 months, then these by 2 (4-monthly level), and then that by 3 (yearly level). Alternatively, one could aggregate to quarterly data, half-yearly and then yearly. The two aggregation paths can happen in parallel. The temporal hierarchy is made up by all possible paths. Note that in constrast to MAPA, levels that do not fully add up to a yearly time series are excluded (intuitively they do not belong in any path from the bottom dissagregate level to the top yearly level). This has the advantage that any forecasting model/method does not need to deal with series that may have fractional seasonality. Nonetheless, this is an interesting future research avenue.

The following interactive plot provides the temporal hierarchies for common types of time series. Observe that many have multiple pathways to the top yearly level (for example, monthly time series), and some are very simple hierarchies (for example, days in week). Use the highlight option to easily visualise the various pathways. Once visualised, the analogies with cross-sectional hierarchies are apparent.

To forecast we need to populate every level of the hierarchy with a forecast. So for example, for the quarterly hierarchy we need to provide 3 sets of forecasts, one for the quarterly time series, one for the semi-yearly and one for the yearly. Imagine that each hierarchy depicts one year’s worth of forecasts, but obviously we can produce the same hierarchy for the next year and so on. Mathematically this is just another column of forecasts to be handled by the hierarchy, so in fact it is trivial to do. But an implication is that forecasts are produced in horizons that are multiples of full years (and then any shorter horizons are used accordingly). People are more familiar with two specific cases of temporal hierarchies. One is when we need to produce a total figure over a period, for example for tactical/strategic forecasts. This is simply the bottom-up interpretation of temporal hierarchies: forecasts from the lowest level are summed to a higher level. The other alternative is to produce a forecast and then use a `profile’ to split this further. In supply chain forecasting and call centres this is very common, in breaking weekly forecasts into daily profiles, or daily forecasts into intra-daily profiles. This is merely the top-down interpretation of temporal hierarchies.

Forecasting with Temporal Hierarchies

You may have already noticed that there is nothing to restrict the source of forecasts. They can be based on some statistical model, judgement, mix of both, differ amongst levels, or whatever other exotic source. This is a substantial advantage over MAPA, and temporal hierarchies provide a flexible MTA foundation. In reconciling the forecast there are couple of complications that we deal with in this paper (the scale and variance of the forecasts are different, which needs to be taken into account during reconciliation). I mentioned earlier that temporal hierarchies are unique. This simplifies substantially the solution, but I will not go into the mathematical details here.

In the following interactive plot you can choose from the usual time series I have been using as examples in this series of posts to produce base (conventional built forecast from a single level, in red) and Temporal Hierarchy Forecasts (THieF, in blue). I provide the forecasts across the various temporal aggregation levels permitted by the hierarchy. Observe how the information across the temporal aggregation levels is shared in the THieF forecasts to achieve better modelling of the series. You can also choose between three different forecasts: exponential smoothing, ARIMA and naive. The naive forecasts are quite illuminating in showing how the multiple views offered by THieF achieve supperior results. There other two types of forecasts are quite illustrative as well.

I also provide Mean Absolute Error (MAE) for the base and THeiF forecasts for the dissagregate series. You will observe that on average THieF forecasts are more accurate. The gains improve at more aggregate levels. In the paper we demonstrate with simulations that in various scenarios of uncertainty (parameter, model) THieF performs better or at least as good as base forecasts.

To sum up, forecasting with temporal hierarchies:

  • offers a very flexible framework to implement MTA, with all its advantages;
  • is independent of source of forecasts, allowing to provide different additional information at different levels, if available;
  • has been shown to offer substantial gains in terms of accuracy over base forecasts, by blending the information available across temporal aggregation levels;
  • provides reconciled short term (dissaggregate) and long-term (aggregate) forecasts, leading to aligned operational, tactical and strategic planning.

If you want to try it out we have released the thief package for R.

A final note on THieF. THieF and MAPA both perform very well and neither is a clear winner in terms of forecast accuracy alone. The two MTA alternatives handle information in a different way. MAPA also takes advantage of the `in-between’ levels that THieF excludes. The good performance of both, even though they have some key differences, is exciting: it gives further merit to MTA and offers some clear directions for future work!

Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.

ISF 2017 presentation: A hierarchical approach to forecasting Scandinavian unemployment

This is joint work with Rickard Sandberg and looks at the implicit connections enforced by hierarchical time series forecasting, between the nodes of the hierarchy, contrasting them to VAR models that captures connections explicitly.


The four major Scandinavian economies (Denmark, Finland, Sweden and Norway) have high workforce mobility and depending on market dynamics the unemployment in one country can be influenced by conditions in the neighbouring ones. We provide evidence that Vector Autoregressive modelling of unemployment between the four countries produces more accurate predictions than constructing independent forecasting models. However, given the dimensionality of the VAR model its specification and estimation can become challenging, particularly when modelling unemployment across multiple factors. To overcome this we consider the hierarchical structure of unemployment in Scandinavia, looking at three dimensions: age, country and gender. This allows us to construct multiple complimentary hierarchies, aggregating across each dimension. The resulting grouped hierarchy enforces a well-defined structure to the forecasting problem. By producing forecasts across the hierarchy, under the restriction that they are reconciled across the hierarchical structure, we provide an alternative way to establish connections between the time series that describe the four countries. We demonstrate that this approach is not only competitive with VAR modelling, but as each series is modelled independently, we can easily employ  advanced forecasting models, in which case independent and VAR forecasts are substantially outperformed. Our results illustrate that there are three useful alternatives to model connections between series, directly through multivariate vector models, through the covariance of the prediction errors across a hierarchy of series, and through the implicit restrictions enforced by the hierarchical structure. We provide evidence of the performance of each, as well as their combination.


ISF2017 presentation: DIY forecasting – judgement, models & judgmental model selection

This is joint work with Fotios Petropoulos and Kostantinos Nikolopoulos and discusses the performance of experts selecting forecasting models, against automatic statistical model selection, as well as providing guidelines how to maximise the benefits. This is very exciting research, demonstrating the both some limitations of statistical model selection (and avenues for new research), as well as the advantages and weaknesses of human experts performing this task.


In this paper we explore how judgment can be used to improve model selection for forecasting.We benchmark the performance of judgmental model selection against the statistical one, based on information criteria. Apart from the simple model choice approach, we also examine the efficacy of a judgmental model build approach, where experts are asked to decide on the existence of the structural components (trend and seasonality) of the time series. The sample consists of almost 700 participants that contributed in a custom-designed laboratory experiment. The results suggest that humans perform model selection differently than statistics. When forecasting performance is assessed, individual judgmental model selection performs equally if not better to statistical model selection. Simple combination of the statistical and judgmental selections and judgmental aggregation significantly outperform both statistical and judgmental selection.


ISF2017 presentation: Call centre forecasting using temporal aggregation

This is joint work with Devon K. Barrow and Bahman Rostami-Tabar and is an initial exploration of the benefits of using Multiple Temporal Aggregation, as implemented in MAPA for call centre forecasting. The preliminary results are encouraging. More details in the attached presentation.


With thousands of call centres worldwide employing millions and serving billions of customers as a first point of contact, accurate scheduling and capacity planning of resources is important. Forecasts are required as inputs for such scheduling and planning in the short medium and long-term. Current approaches involve forecasting weekly demand and subsequent disaggregation into half-hourly, hourly and daily time buckets as forecast are required to support multiple decisions and plans. Once the weekly call volume forecasts are prepared, accounting for any seasonal variations, they are broken down into high frequencies using appropriate proportions that mainly capture the intra-week and intra-day seasonality. Although this ensures reconciled forecasts across all levels, and therefore aligned decision making, it is potentially not optimal in terms of forecasting. On the other hand, producing forecasts at the highest available frequency, and aggregating to lower frequencies, may also not be ideal as very long lead-time forecasts may be required. A third option, which is more appropriate from a forecasting standpoint, is to produce forecasts at different levels using appropriate models for each. Although this has the potential to generate good forecasts, in terms of decision making the forecasts are not aligned, which may cause organisational problems. Recently, Kourentzes et al. (2014) proposed the Multiple Aggregation Prediction Algorithm (MAPA), where forecasting with multiple temporal aggregation (MTA) levels allows both accurate and reconciled forecasts. The main idea of MTA is to model a series at multiple aggregation levels separately, taking advantage of the information that is highlighted at each level, and subsequently combine the forecasts by using the implied temporal hierarchical structure. Athanasopoulos et al. (2017) proposed a more general MTA framework than MAPA, defining appropriate temporal hierarchies and reconciliation mechanisms, and thus providing a MTA forecasting framework that is very flexible and model independent, while retaining all the benefits of MAPA. Given the high frequency, multi-temporal nature of the forecast requirements and the subsequent planning associated with call centre arrival forecasting, MTA becomes a natural, but yet unexplored candidate for call centre forecasting. This work evaluates whether there are any benefits from temporal aggregation both at the level of decision making as well as at the level of aggregation in terms of forecast accuracy and operational efficiency. In doing so, various methods of disaggregation are considered when the decision level and the forecasting level differ, including methods which results in reconciled and unreconciled forecasts. The findings of this study will contribute to call centre management practice by proposing best approaches for forecasting call centre data at the various decision levels taking into account accuracy and operational efficiency, but will also contribute to research on the use of temporal hierarchies in the area  of high frequency time series data.


Multiple temporal aggregation: the story so far. Part III: MAPA

Multiple Aggregation Prediction Algorithm (MAPA)

In this third post about modelling with Multiple Temporal Aggregation (MTA), I will explain how the Multiple Aggregation Prediction Algorithm (MAPA) works, which was the first incarnation of MTA for forecasting.

MAPA is quite simple in its logic:

  1. a time series is temporally aggregated into multiple levels, at each level strengthening and weakening various components of the time series, as discussed before;
  2. at each level an independent exponential smoothing (ETS) model is fit and its components are extracted;
  3. the ETS components are combined, using a few tricks (see the paper for details), to produce the final forecast, which borrows information from all levels.

In the original MAPA paper alternative combination approaches were trialed, where all temporal aggregation levels were given equal importance and combined through mean or median. For the seasonal component (which is the high frequency one), this causes an important issue: as the seasonality is filtered at the aggregate levels, it is effectively shrunk towards zero. Therefore, the combined seasonal component will be shrunk as well. Originally this was addressed by using a simple heuristic, combining the ETS forecast of the original series, with the MAPA forecast (hybrid approach). This effectively means that we use a weighted combination, where the first temporal aggregation level is given more weight than all other temporal aggregation levels together. Empirical evidence suggests that this re-weighting is beneficial.

The latter developed w.mean and w.median weight schemes attempt to do the same with variable weights across temporal aggregation levels for the seasonal component. In fact, when dealing with high frequency time series, it is always recommended to use these.

The interactive plot below illustrates how MAPA works. You can choose between various time series, the combination scheme for the components across temporal aggregation levels, and whether the hybrid forecast (effectively a re-weighting of the components) is used or not. The top plot shows the identified ETS models at each temporal aggregation level. Greyed cells indicate levels that no seasonality is estimated. These are levels that would require fractional seasonality, not permitted by conventional ETS. The second row of plots provides the forecasted ETS components across temporal aggregation levels, as well as the combined one (thick black line). The components of the first aggregation level are those of the ETS fitted at the original time series, and are plotted with a thicker line. For the seasonal component only levels that can be seasonal are used. Note the difference between the trajectories for the ETS and MAPA components, as well as the various components at different temporal aggregation levels. The bottom plot provides the forecasts of ETS and MAPA. The MAPA forecast is simply the addition of the three MAPA components. If Hybrid is used, the resulting MAPA forecast is the combination of the MAPA components and the ETS forecast (that is just equivalent to a specific weighting scheme of the components).

In most cases the level and trend components are different from the ETS ones, and the seasonality is always somewhat shrunk, depending on the combination weights scheme used.

Why does MAPA work? It captures low frequency components (trend) better, because of the temporal aggregation. Furthermore, it does not rely on a single ETS model that may or may not be well identified, therefore mitigating model uncertainty. I argue that MTA, as implemented in MAPA is a neat trick to extract more information from a time series… for free!

What are the limitations of MAPA? Quite a few, but there are two major ones: (i) the combinations weight schemes are ad-hoc, but there is strong evidence that equal weights are surely not the best solution; (ii) the identification of the ETS model for levels that any seasonality might be fractional is weakened by not considering that seasonality and letting it contaminate the level and slope components.  Arguably, the use of ETS at its core is another potential limitation, although that is possible to lift.

In practice, MAPA was the first method after almost 15 years to improve upon the M3 competition results and since then there is mounting empirical evidence supporting its good forecasting performance, as well as extensions to incorporate explanatory variables and forecast intermittent demand time series. An interesting finding is that MAPA is very robust against misspecification, when compared to more conventional approaches that attempt to capitalise on a single (optimal) level of temporal aggregation. If you want to try it out, there is a package for R available (or on GitHub).

Ultimately the true contribution of MAPA was to demonstrate that the ideas behind MTA were sound and useful for forecasting! For this, the paper that introduced MAPA recently received the International Journal of Forecasting 2014-2015 best paper award; I am very humbled and happy for this!

Although the aforementioned limitations are not resolved, MAPA motivated research into Temporal Hierarchies that provide a more thorough foundation for using MTA in forecasting, overcoming many of MAPA’s issues, and enabling multiple avenues of future research. This will be the topic of a future post in the series. Till then, I will conclude by mentioning that MAPA in many applications still provides more accurate forecasts than Temporal Hierarchies, demonstrating that it is still an interesting research topic.

Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.

International Journal of Forecasting 2014-2015 best paper award

In the very enjoyable and stimulating International Symposium on Forecasting that just finished in Cairns, Australia, the International Journal of Forecasting (IJF) best paper award for the years 2014-2015 (list of past papers can be found here) was given to one of my papers: Improving forecasting by estimating time series structural components across multiple frequencies!

This paper proposes the use of Multiple Temporal Aggregation approach that I have been posting about, and introduces the MAPA forecasting method, for which there is an R package. The other shortlisted papers were of very good quality and I am humbled by the choice of the editorial board. My thanks to my co-authors and the reviewers, who made this paper possible.

Material for `Forecasting with R: A practical workshop’ at ISF2017

You can find the material from yesterday’s workshop at the International Symposium on Forecasting, 2017 here. The workshop notes assume some knowledge of what the various forecasting methods do, and the main focus is on showing which functions to use and how, so as to perform a wide variety of forecasting tasks:

  • Time series exploration
  • Univariate (extrapolative) forecasting
  • Intermittent demand series forecasting
  • Forecasting with regression
  • Special topics: (i) Hierarchical forecasting; (ii) ABC-XYZ analysis; and (iii) LASSO regression


  1. Workshop notes: these provide code examples with comments. You will also find some references for the various methods used in the workshop.
  2. Workshop slides: these provide an extremely brief overview of some of the methods used and their implementation.
  3. Workshop R solution scripts: these replicate the examples in the notes.
  4. Workshop data: these are needed to replicate the examples in the notes and scripts.

The notes are aimed at researchers and experienced practitioners, who are comfortable with the theory behind the various models and methods. I hope you find this material useful!

A couple of more packages to explore:

Tactical sales forecasting using a very large set of macroeconomic indicators

Y.R. Sagaert, E-H. Aghezzaf, N. Kourentzes and B. Desmet, 2017. European Journal of Operational Research.

Tactical forecasting in supply chain management supports planning for inventory, scheduling production, and raw material purchase, amongst other functions. It typically refers to forecasts up to 12 months ahead. Traditional forecasting models take into account univariate information extrapolating from the past, but cannot anticipate macroeconomic events, such as steep increases or declines in national economic activity. In practice this is countered by using managerial expert judgement, which is well known to suffer from various biases, is expensive and not scalable. This paper evaluates multiple approaches to improve tactical sales forecasting using macro-economic leading indicators. The proposed statistical forecast selects automatically both the type of leading indicators, as well as the order of the lead for each of the selected indicators. However as the future values of the leading indicators are unknown an additional uncertainty is introduced. This uncertainty is controlled in our methodology by restricting inputs to an unconditional forecasting setup. We compare this with the conditional setup, where future indicator values are assumed to be known and assess the theoretical loss of forecast accuracy. We also evaluate purely statistical model building against judgement aided models, where potential leading indicators are pre-filtered by experts, quantifying the accuracy-cost trade-off. The proposed framework improves on forecasting accuracy over established time series benchmarks, while providing useful insights about the key leading indicators. We evaluate the proposed approach on a real case study and find 18.8\% accuracy gains over the current forecasting process.

Download paper.

Multiple temporal aggregation: the story so far. Part II: The effects of TA

The effects of temporal aggregation

In this post I will demonstrate the effects of temporal aggregation and motivate the use of multiple temporal aggregation (MTA). I will not delve into the econometric aspects of the discussion, but it is worthwhile to summarise key findings from the literature. A concise forecasting related summary is available in our recent paper Athanasopoulos et al. (2017), section 2:

  • Temporal aggregation changes the (identifiable) structure of the time series;
  • As the aggregation level increases there are less components that appear and higher-frequency components (for example, seasonality and promotions) become weaker or vanish altogether;
  • Temporal aggregation reduces the sample size resulting in loss of estimation efficiency. To make this simple, if you have 4 years of monthly data and you aggregate your series to a yearly level you will have to build a model with only for four data point, risky!
  • There are accuracy gains to be had, but identifying the (single) appropriate temporal aggregation level is very difficult! Yet, it still simplifies some problems, like intermittent demand forecasting.

What do these mean for our forecasts? Well, if you work on the basis that the true model is an elusive idea, these are not too prescriptive for constructing forecasts. I will try to give you an intuition visually. In the following interactive visualisation you can choose between different time series and plot the original and temporally aggregated data, together with a seasonal plot. The seasonal plot will be shown only when it is feasible, i.e. the resulting seasonality after the aggregation has an integer period greater than 1. For each series I also fit an appropriate exponential smoothing model (selected using AICc) and provide a list of the fitted components for all temporal aggregation levels, up to yearly data. I also provide the relevant forecast. Observe a few things:

  • The identified exponential smoothing models are often different across temporal aggregation levels. In particular the seasonality is filtered as we aggregate into bigger time buckets. Of course, for some aggregation levels (for example, aggregate every 5-months) the resulting series has a non-integer seasonal period and typical forecasting methods cannot capture it and instead it contributes to the error part;
  • Other aspects of the series, like outliers, vanish as we aggregate to higher levels;
  • Some times aggregation makes the time series easier to model, and sometimes it over-smooths the series! The forecasts surely vary a lot as we aggregate.

At minimum we can say that temporal aggregation alters the identifiable parts of the time series, strengthening low-frequency components (such as trend), while weakening high-frequency components (such as seasonality). Depending on the forecasting objective, this may result in better forecasts, especially if we are aiming at long term forecasts. Furthermore, simply because the temporal aggregation filters part of the noise (it is a moving average filter!) it may just be better to model a series at a more aggregate level.

The main problem in the literature is that it is very difficult to know what is the optimal temporal aggregation level, which will maximise your forecast accuracy, for real data. This is not a trivial point: There are theoretical solutions suggesting the optimal temporal aggregation level for various data generation processes, but they rely on full knowledge of the process! Well, if I knew the process, then forecasting it would be trivial. Recent research showed that although we can easily show benefits on simulated data, it becomes much more complicated with real data that the true model is unknown.

If we connect the dots, there are four key arguments in favour of MTA (discussed in more detail in these three papers [1], [2] and [3]):

  • Because we are provided with a time series sampled at some time interval, we do not have to model it at that level! It may be better to do so at some aggregate level.
  • Temporal aggregation can be beneficial for forecasting, but identifying a single optimal level of aggregation is very challenging, so why not use multiple?
  • Using multiple levels we avoid relying on a single forecasting model, therefore we mitigate modelling uncertainty by considering multiple (different) models across temporal aggregation levels.
  • Holistic modelling of the time series information: Models built on the original data or on low aggregation levels can focus more on high frequency components, while models build on high aggregation levels focus on low frequency components, which may not be easy to capture in the originally sampled time series.

All these points suggest that using Multiple Temporal Aggregation levels should be useful, but we have not yet addressed the question how to do this! I will introduce our first attempt to do this, the Multiple Temporal Aggregation Algorithm (MAPA) in the next post in the series.

Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.