**The effects of temporal aggregation**

In this post I will demonstrate the effects of temporal aggregation and motivate the use of multiple temporal aggregation (MTA). I will not delve into the econometric aspects of the discussion, but it is worthwhile to summarise key findings from the literature. A concise forecasting related summary is available in our recent paper Athanasopoulos et al. (2017), section 2:

- Temporal aggregation changes the (identifiable) structure of the time series;
- As the aggregation level increases there are less components that appear and higher-frequency components (for example, seasonality and promotions) become weaker or vanish altogether;
- Temporal aggregation reduces the sample size resulting in loss of estimation efficiency. To make this simple, if you have 4 years of monthly data and you aggregate your series to a yearly level you will have to build a model with only for four data point, risky!
- There are accuracy gains to be had, but identifying the (single) appropriate temporal aggregation level is very difficult! Yet, it still simplifies some problems, like intermittent demand forecasting.

What do these mean for our forecasts? Well, if you work on the basis that the true model is an elusive idea, these are not too prescriptive for constructing forecasts. I will try to give you an intuition visually. In the following interactive visualisation you can choose between different time series and plot the original and temporally aggregated data, together with a seasonal plot. The seasonal plot will be shown only when it is feasible, i.e. the resulting seasonality after the aggregation has an integer period greater than 1. For each series I also fit an appropriate exponential smoothing model (selected using AICc) and provide a list of the fitted components for all temporal aggregation levels, up to yearly data. I also provide the relevant forecast. Observe a few things:

- The identified exponential smoothing models are often different across temporal aggregation levels. In particular the seasonality is filtered as we aggregate into bigger time buckets. Of course, for some aggregation levels (for example, aggregate every 5-months) the resulting series has a non-integer seasonal period and typical forecasting methods cannot capture it and instead it contributes to the error part;
- Other aspects of the series, like outliers, vanish as we aggregate to higher levels;
- Some times aggregation makes the time series easier to model, and sometimes it over-smooths the series! The forecasts surely vary a lot as we aggregate.

At minimum we can say that temporal aggregation alters the identifiable parts of the time series, strengthening low-frequency components (such as trend), while weakening high-frequency components (such as seasonality). Depending on the forecasting objective, this may result in better forecasts, especially if we are aiming at long term forecasts. Furthermore, simply because the temporal aggregation filters part of the noise (it is a moving average filter!) it may just be better to model a series at a more aggregate level.

The main problem in the literature is that it is very difficult to know what is the optimal temporal aggregation level, which will maximise your forecast accuracy, for **real data**. This is not a trivial point: There are theoretical solutions suggesting the optimal temporal aggregation level for various data generation processes, but they rely on full knowledge of the process! Well, if I knew the process, then forecasting it would be trivial. Recent research showed that although we can easily show benefits on simulated data, it becomes much more complicated with real data that the true model is unknown.

If we connect the dots, there are four key arguments in favour of MTA (discussed in more detail in these three papers [1], [2] and [3]):

- Because we are provided with a time series sampled at some time interval, we do not have to model it at that level! It may be better to do so at some aggregate level.
- Temporal aggregation can be beneficial for forecasting, but identifying a single optimal level of aggregation is very challenging, so why not use multiple?
- Using multiple levels we avoid relying on a single forecasting model, therefore we mitigate modelling uncertainty by considering multiple (different) models across temporal aggregation levels.
- Holistic modelling of the time series information: Models built on the original data or on low aggregation levels can focus more on high frequency components, while models build on high aggregation levels focus on low frequency components, which may not be easy to capture in the originally sampled time series.

All these points suggest that using Multiple Temporal Aggregation levels should be useful, but we have not yet addressed the question how to do this! I will introduce our first attempt to do this, the Multiple Temporal Aggregation Algorithm (MAPA) in the next post in the series.

Multiple Temporal Aggregation: the story so far: Part I; Part II.