Author Archives: Nikos

Automatic robust estimation for exponential smoothing: perspectives from statistics and machine learning

Devon Barrow, Nikolaos Kourentzes, Rickard Sandberg, and Jacek Niklewski, 2020. Expert Systems with Applications.

A major challenge in automating the production of a large number of forecasts, as often required in many business applications, is the need for robust and reliable predictions. Increased noise, outliers and structural changes in the series, all too common in practice, can severely affect the quality of forecasting. We investigate ways to increase the reliability of exponential smoothing forecasts, the most widely used family of forecasting models in business forecasting. We consider two alternative sets of approaches, one stemming from statistics and one from machine learning. To this end, we adapt M-estimators, boosting and inverse boosting to parameter estimation for exponential smoothing. We propose appropriate modifications that are necessary for time series forecasting while aiming to obtain scalable algorithms. We evaluate the various estimation methods using multiple real datasets and find that several approaches outperform the widely used maximum likelihood estimation. The novelty of this work lies in (1) demonstrating the usefulness of M-estimators, (2) and of inverse boosting, which outperforms standard boosting approaches, and (3) a comparative look at statistics versus machine learning inspired approaches.

Download paper.

Elucidate structure in intermittent demand time series

Nikolaos Kourentzes and George Athanasopoulos, 2020. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2020.05.046

Intermittent demand forecasting has been widely researched in the context of spare parts management. However, it is becoming increasingly relevant to many other areas, such as retailing, where at the very disaggregate level time series may be highly intermittent, but at more aggregate levels are likely to exhibit trends and seasonal patterns. The vast majority of intermittent demand forecasting methods are inappropriate for producing forecasts with such features. We propose using temporal hierarchies to produce forecasts that demonstrate these traits at the various aggregation levels, effectively informing the resulting intermittent forecasts of these patterns that are identifiable only at higher levels. We conduct an empirical evaluation on real data and demonstrate statistically significant gains for both point and quantile forecasts.

Download paper.

Optimising forecasting models for inventory planning

Nikolaos Kourentzesa, Juan R. Trapero, and Devon K. Barrow, 2020. International Journal of Production Economics. https://doi.org/10.1016/j.ijpe.2019.107597

Inaccurate forecasts can be costly for company operations, in terms of stock-outs and lost sales, or over-stocking, while not meeting service level targets. The forecasting literature, often disjoint from the needs of the forecast users, has focused on providing optimal models in terms of likelihood and various accuracy metrics. However, there is evidence that this does not always lead to better inventory performance, as often the translation between forecast errors and inventory results is not linear. In this study, we consider an approach to parametrising forecasting models by directly considering appropriate inventory metrics and the current inventory policy. We propose a way to combine the competing multiple inventory objectives, i.e. meeting demand, while eliminating excessive stock, and use the resulting cost function to identify inventory optimal parameters for forecasting models. We evaluate the proposed parametrisation against established alternatives and demonstrate its performance on real data. Furthermore, we explore the connection between forecast accuracy and inventory performance and discuss the extent to which the former is an appropriate proxy of the latter.

Download paper.

Cross-temporal aggregation: Improving the forecast accuracy of hierarchical electricity consumption

Evangelos Spiliotis, Fotios Petropoulos, Nikolaos Kourentzes, and Vassilios Assimakopoulos, 2020. Applied Energy 261: 114339. https://doi.org/10.1016/j.apenergy.2019.114339

Achieving high accuracy in energy consumption forecasting is critical for improving energy management and planning. However, this requires the selection of appropriate forecasting models, able to capture the individual characteristics of the series to be predicted, which is a task that involves a lot of uncertainty. When hierarchies of load from different sources are considered together, the uncertainty and complexity increase further. For example, when forecasting both at system and region level, not only the model selection problem is expanded to multiple time series, but we also require aggregation consistency of the forecasts across levels. Although hierarchical forecasting, such as the bottom-up, the top-down, and the optimal reconciliation methods, can address the aggregation consistency concerns, it does not resolve the model selection uncertainty. To address this issue, we rely on Multiple Temporal Aggregation (MTA), which has been shown to mitigate the model selection problem for low-frequency time series. We propose a modification of the Multiple Aggregation Prediction Algorithm, a special implementation of MTA, for high-frequency time series to better handle the undesirable effect of seasonality shrinkage that MTA implies and combine it with conventional cross-sectional hierarchical forecasting. The impact of incorporating temporal aggregation in hierarchical forecasting is empirically assessed using a real data set from five bank branches. We show that the proposed MTA approach, combined with the optimal reconciliation method, demonstrates superior accuracy, aggregation consistency, and reliable automatic forecasting.

Download paper.

Forecasting keynote at AMLC 2019

A few weeks ago I gave a talk at Amazon’s 2019 AMLC in Seattle. The talk was focused on current research in temporal and cross-temporal hierarchies. People who have been following my blog will be familiar with the topic and recent advances. This talk is different in the sense that it does not go in the technicalities, but rather looks at the benefits of temporal and cross-temporal hierarchies for the forecasting process of companies. The last few slides outline current research and my view for some interesting upcoming applications.

You can find the slides here.

Many thanks for the invitation to give the forecasting keynote!

 

Invited talk at Amazon Web Services

I was recently invited to give a talk at AWS in Berlin. I presented the current work on temporal and cross-temporal hierarchical forecasting. My view is that there is a lot of potential for these approaches to augment existing forecasting processes with relative ease.

Considering the wider forecasting problem, we do not forecast for the sake of forecasting, but to support decisions, with different planning horizons, objectives and information base. These approaches permit merging all these views to achieve aligned decision making, while at the same time improving forecast accuracy. On a personal note, it is nice to see that industry is nowadays very fast at considering/adopting new research. Creating in-house teams with forecasting expertise offers tremendous opportunities to companies to capitalise on innovations in research and the open source community that most researches contribute to. If you are not familiar with the research follow the links above to find out more about them.

On another note, if you are not familiar with the forecasting work of AWS, I will point you to their new open source library for forecasting with deep learning: gluonts. I had the chance to discuss with the team some of the internal workings of the library and they have put together a very interesting and useful tool. I hope to find the time to try it some more myself and I will post my results and thoughts here. A word of caution (this is something that the team at AWS also repeat themselves quite often): deep learning is not the solution for all the problems, but has a lot of potential when the data permit. If you deal with limited sales data and only a few time series, perhaps the humble exponential smoothing is still a very good contender. But otherwise, there are a lot of innovations in neural networks to make them a worthy contender for forecasting. Nonetheless, irrespectively of your views on deep learning and forecasting hats off to AWS for contributing back to the research and open source community.

Finally, many thanks to Tim Januschowski for the invitation and hosting me!

 

Visit at Universitat Politècnica de València

I was recently invited to a workshop focused on forecasting and supply chain management at Valencia Polytechnic University. Many thanks to Ester Guijarro for organising the workshop and helping to bring together forecasters and supply chain experts!

Left to right: Ester, Juan and me after our talks. You can guess how the weather is in Valencia from our colours!

I presented on optimising forecasting model parameters for inventory management. You can find the presentation here, and a working version of the paper here. The paper is currently under review, so I would expect quite a few changes in the final version! Whether we are critical of the review process or not, in the vast majority of cases the it improves papers substantially and this will certainly be the case here. The view we take on this work with my co-authors is that we can integrate forecasting and inventory management more closely, and instead of optimising forecasting models to maximise fit on past sales, hoping that this will result in good inventory performance (and there are many good reasons for this to hold!), we can directly optimise so as to minimise deviations from the desired inventory performance. This seems to work quite well in our empirical evaluation.

Juan R Trapero presented a paper we have worked together with Manuel Cardos on calculating empirical safety stocks. You can find the presentation here. It looks at using kernel density estimation and GARCH models to address different deficiencies of standard approaches. Namely, kernels are particularly good at handling asymmetries in the forecast error distributions (promotional forecasts I am looking at you) and GARCH for handling residual autocorrelations. Both cases are quite common in practice, as often our forecasting models are far from the underlying demand generating process. You can find the relevant published paper here, as well as a follow up work here.

ISF2019 talk: Cross-temporal coherent forecasts for tourism forecasting

This year’s International Symposium on Forecasting has been a great success. Very exciting talks and large attendance from both academics and practitioners. I really enjoy conferences that the two groups interact organically: only this way research is both relevant and adopted fast, so that it makes a difference!

This year I was invited by Haiyan Song to present my paper with George Athanasopoulos on tourism forecast. Many thanks to both Haiyan for the invitation and the session he organised, but also to George for putting together a terrific conference.

You can find my presentation here, and the paper is available here. I plan to release some reusable code for this as soon as possible, as I see the potential of cross-temporal hierarchical forecasting for many areas of business forecasting. The beauty of this work is that it is quite easy to work into the existing forecasting processes of organisations and it is complimented by a lot of current work on hierarchical forecasting – very interesting advances presented in the last ISF2019.

I was also invited to a panel to talk to Early Career Researchers (ERC) about life in academia. My thanks to Shari De Baets, Michał Chojnowski and Anna Sroginis for inviting me and arranging the sessions for ERCs. All three of them are very promising researchers; keep an eye for their work. Great initiative and absolutely necessary for having a thriving research community! Having been a young researcher myself not so many years ago, here is my question: is it the strands of grey hair that now qualify me as senior enough to advice ERCs? Never got a straight answer on that!

The jolly panel! From left to right: Hayan Song, me, Tanya Garcia and Rob Hyndman.

And my view for ERCs (or any colleague actually): choose your university/employer carefully! The negotiating power is with you. Research is now an international environment and there is large demand for talented people. A good environment will propel your career. Have supportive colleagues and a work place that respects your ideas and your life. Only when these are in balance you can be happy and inspired to really make a difference with your research and teaching! As to my peers (given the white strands): let us support ERCs as much as we can, it is our responsibility as educators and researchers. We need to use our weight as seniors to make sure they can meet their potential.

Towards the “one-number forecast”

1. Introductory remarks

One of the recurrent topics in online discussions on sales forecasting and demand planning is the idea of the “one-number forecast”, that is a common view of the future on which multiple plans and decisions can be made, from different functions of an organisation. In principle, this is yet another idea around the notion that we must break information silos within organisations. This topic has been hotly debated in practice, but academia has been somewhat silent. There are good reasons for this. I would argue that a key to this is that many colleagues in forecasting, econometrics and machine learning are predominantly focused on algorithmic and modelling research questions, rather than the organisational context within which forecasts are generated. This naturally results in different research focus. Given my background in strategic management, I always like to ponder on the organisational aspects of forecasting – even though I admit that most of my research revolves around algorithmic and modelling questions!

Over the years I have written a lot about the benefits of multiple temporal aggregation, either in the form of MAPA or Temporal Hierarchies, in terms of achieving both accuracy gains and importantly alignment between short- and long-term forecasts, as well as allowing operational information to pass on seamlessly to strategic decision makers and vice-versa. Yet, this still leaves the cross-sectional aspect of the forecasting problem (for example, different product categories, market segments, etc.) somewhat disconnected. Keeping these two streams of forecasting research disconnected has left the so-called one-number forecast beyond our modelling capabilities and to the sphere of organisational and process design and management (see S&OP, IBP, or other various names of the idea that people should… talk – hint: works in every aspect of life!).

Over the years, with colleagues, I have approached the problem from various aspects, but I think I finally have a practical solution to this that I am happy to recommend!

2. Why bother?

So what is the limitation of traditional forecasting? Why do we need this more integrated thinking?

  • Forecasts build for different functions/decisions are typically based on different information and therefore these are bound to differ and ignore much of the potentially available information.
  • Statistically speaking, given different target forecast horizons or different input information, typically a different model is more appropriate. The resulting forecasts are bound to differ as well.
  • Different functions/decisions need forecasts at different frequencies. We need to account for different decision-making frequency and speed for operational decisions (for example, inventory management) and different for tactical/strategic (for example, location and capacity of a new warehouse).
  • Forecasts that differ will provide misaligned decisions, which will result in organisational friction, additional costs and lost opportunities.
  • Different forecasts give a lot of space for organisational politics: my forecast is better than yours! This is often resolved top-down, which eliminates important information that in principle is available to the organisations. Organisational politics and frictions are a leading reason for silos.
  • A quite simple argument: if you have many different forecasts about the same thing that do not agree, most, if not all, are wrong. (Yes statistically speaking all forecasts are wrong, but practically speaking many are just fine and safe to use!).

What is wrong with using cross-sectional hierarchical forecasting (bottom-up, top-down, etc.) to merge forecasts together?

  • First, none of bottom-up, top-down or middle-out are the way to go. The optimal combination (or MinT) methodology is more meaningful and eliminates the need for a modelling choice that is not grounded on any theoretical understanding of the forecasting problem.
  • Cross-sectional hierarchical forecasting can indeed provide aggregate coherent (that is forecasts on different levels, such as SKUs sales and product category sales, that add up perfectly), but they do so for a single time instance. Let’s make this practical. Having coherent forecasts at the bottom level, where say weekly forecasts of SKUs are available is meaningful. As we go to higher levels of the hierarchy, is there any value on weekly total sales of the company? More importantly, apart from the statistical convenience of such as forecast, is there any meaningful information that senior management can add on a weekly basis (or would they bother?).

What is wrong with using temporal hierarchical forecasting to merge forecasts together?

  • This is the complimentary problem of cross-sectional forecasting. Now we have the temporal dynamics captured. We merge together information that is relevant for short-term forecasting, but also long-term, to gain benefits in both. However, SKU sales on weekly frequency are useful, but SKU annual sales not so. Probably there you need product group of even more aggregate figures at annual buckets of sales.

What is wrong with building super-models that get all the information in one go and produce outputs for everything?

  • That should sound dodgy! If this was a thing, then this blog wouldn’t really exist…
  • On a more serious note, statistically speaking this is a very challenging problem, both in terms of putting down the equations for such a model, but also estimating meaningful parameters. Economics has failed repeatedly in doing this for macro-economy and there are well understood and good reasons why our current statistical and mathematical tools fail at that. I underline current because research is ongoing!
  • From an organisational point of view, that would require a data integration maturity, as well as almost no silos between functions and teams, so as to be able to get all the different sources of data in the model in a continuous and reliable form. Again, this is not theoretically impossible, but my experience in working with some of the leading companies in various sectors is that we are not there yet.

So getting true one-number forecasts is more difficult that one would like. Does it worth the effort?

  • If different functions/decision makers have the same view about the future, they are better informed and naturally will tend to take more aligned decisions, with all the organisational and financial benefits.
  • If such forecasts were possible, it would enable overcoming many of the organisational silos in a data-driven way. Between innovating human organisations and behaviours or statistics, it is somewhat easy to guess which one is easier!

3. How to build one-number forecasts?

Let me say upfront that:

  • It is all about bringing together the cross-sectional and temporal hierarchies. There is a benefit to this: both are mature enough to offer substantial modelling flexibility and therefore they do not restrict our forecasting toolbox, including statistics, machine learning, managerial expertise or “expert” judgement.
  • It can be done in a modular fashion, so forecasts do not need to be fully blended from the onset, but as the organisation gains more maturity, then more functions can contribute to the one-number forecast, so that we move towards the ultimate goal in practical and feasible steps.
  • Therefore, what follows can be implemented within the existing machinery of business forecasting (please don’t ask me how to do this in Excel! It can be done, but why?).
  • For anyone interested, this is the relevant paper (and references within), but quite readily I admit that papers are often not written to be… well, accessible. I hope that readers of my academic work will at least feel that I try to put some effort to make my work accessible to varying degrees of success!

3.1. The hierarchical forecasting machinery

We need to start with the basics of how to blend forecasts for different items together. Suppose we have a hierarchy of products, such as the one in Figure 1. This is a fairly generic hierarchy that we could imagine it describes sales of SKUs XX, XY and YX, YY and YZ, which can be grouped together in product groups X and Y, which in turn can be aggregated to Total sales. This hierarchy also implies that there are some coherence constraints: Total = X + Y, X = XX + XY, Y = YX + YY + YZ. This is surely true for past sales and it should be true for any forecasts.

Figure 1. Total sales can be broken into product groups X and Y, which in turn contain SKUs XX, XY and YX, YY and YX.

This restriction, that the coherence should be true for forecasts, is very helpful in giving us a mechanism to blend forecasts. This is not a top-down or bottom-up problem. The reason is that each level is relevant to different parts of an organisation and they have different information available. SKU level sees the detailed demand and interaction with customers, on high frequency. This is very relevant, for example, for inventory management. Product/brand level sees the aggregate demand patterns, the perception of a brand, etc., which is very relevant, for example, to marketing. Budgeting and financial operations would be very interested in the total level. All these different functions will most probably have different information sets available and should be using different models (or “expert guessing”) to build forecasts. Primarily, as these models are required to give different types of outputs, for different time scales. For example, inventory planners need forecasts and uncertainties to plan safety stocks and orders, typically for short horizons. Marketing needs elasticities of promotions and pricing and potentially longer-term forecasts. Financial operations even longer horizons and forecasts expressed in monetary terms, rather than product units. Therefore, it is not just about making numbers match, but it is about bringing different organisational views together. Top-down and bottom-up fail completely on this aspect.

So how are we to bring different views together? Let us abstract the problem a bit (because using X, XX, and so on, was very specific for my taste!). The hierarchy in Figure 1 can be written as the following matrix S.

The structure of S codifies the hierarchy. Each column corresponds to a bottom level time series and each row to a node/level of the hierarchy. We place 1 when a bottom-level (column) element contributes to that level and 0 otherwise. With this structure, if one would take the SKU sales and pass them through this S (for summing) matrix, then the outcome would be the sales for all levels of the hierarchy (rows).

If instead of sales we had forecasts for the bottom level, using S we can produce bottom-up forecasts for the complete hierarchy. Likewise, if we had forecasts for only the top level (Total) then we could use the mapping in S to come up with a way to disaggregate forecasts to the lower levels. A couple of paragraphs above I argued that we need forecasts at all levels. If we do that, the same S we help us understand how much our forecast disagree: how decoherent they are. Skipping the mathematical derivations, it has been shown that the following equation can take any raw decoherent forecasts and reconcile them, by attempting to minimise the reconciliation errors, that is how much the forecasts disagree.

I am using matrix notation to avoid writing massive messy formulas. What the above says is: give me all your initial forecasts and I will multiply them with a matrix G that contains some combination weights, the S matrix that maps the hierarchy and I will give you back coherent forecasts. This is fairly easy if G is known. Before we go into the estimation of G there are some points useful to stress, which are typically not given enough attention in hierarchical forecasting:

  • Hierarchical forecasting is merely a forecast combination exercise, where we linearly combine (independent) forecasts of different levels.
  • Combinations of forecasts are desirable. Statistically, they typically lead to more accurate forecasts (this is why hierarchical forecasting often relates to accuracy gains), but also substantially mitigates the model selection problem, as it is okay to get some models wrong.
  • That the forecasts can be independent is a tremendous advantage for practice. At each node/level we can produce forecasts separately, based on different information and forecasting techniques, matching the requirements as needed. Statistically speaking, independent forecasts may not be theoretically elegant, but they are certainly much simpler to specify and estimate, so quite useful for practice!
  • There is no need to aggregate/disaggregate. Hierarchical forecasting directly produces forecasts for all levels.

Let us return to the estimation of G. The formula for this is:

That means that G is dependent on the map of the hierarchy in S and the forecast errors. Estimating W is not straightforward (for details and example see this paper, section 2), but suffice to say that it accounts for the forecast errors, or in other words the quality of the forecast at each node. In a nutshell, poor forecasts will be given less weight than better forecasts. Consider the following: if all forecasts were perfect, then they would be coherent and no need to reconcile the forecasts. Of course, in practice they are not, and we prefer to adjust more the inaccurate forecasts, as the chance is that they are probably more responsible for the lack of coherence.

3.2. Cross-sectional hierarchical forecasting

This is the standard form of hierarchical forecasting that most people relate to through the bottom-up and top-down forecasting logic. In this case, G plays the role of aggregation or disaggregation weights. I am hesitant to use the forecast combination logic in this case, as we do not use forecasts from different levels of the hierarchy, but we combine forecasts from a single level only. This increases the forecasting risk, as we rely on fewer models, forecasts and less information.

Using the machinery described above one could forecast each node in the hierarchy independently (the machinery does not preclude using models capable of producing forecasts for multiple nodes simultaneous). For example, at the very disaggregate level, one might use intermittent demand forecasting methods, or other automatic univariate forecasting models, such as exponential smoothing. The reason being is that in practice the bottom levels are very large, typically containing (many) thousands of time series, which need to be forecasted very frequently. Here automation and reliability are essential. At higher levels, explanatory variables may be more relevant. For instance, one could use at the higher levels of the hierarchy leading indicators from the macro-economic environment to augment the forecasts. Also at that level, there are fewer forecasts to be made, so human experts can intervene more easily. Such information may be difficult to connect to the time series at the lower levels of the hierarchy.

Using cross-sectional hierarchical forecasting the different forecasts from different nodes/levels are blended, providing typically more accurate predictions and aligned decisions. In principle, if for every node we would produce the best practically possible forecasts, the blended coherent forecasts would contain all this information, as well as providing a common view of the future. The catch is that the whole hierarchy is tied to the same time bucket. If say the lowest level is at a daily or weekly sampling frequency, so is the top level. At the very aggregate level decision making is typically slower than at the very disaggregate operational level. This mismatch makes cross-sectional hierarchical forecasting useful for some aspects, but at the same time reduces it to a statistical exercise that hopes merely for forecast accuracy improvements.

3.3. Temporal hierarchies

Temporal hierarchies use the same machinery to solve the problem across time. I have covered this topic in more detail in previous posts, so I will be brief here.  Suppose we deal with a quarterly time series. This implies a hierarchical structure as in Figure 2.

Figure 2. A time series sampled in quarterly frequency forms an implicit temporal hierarchy, where an annum is split into two semi-annual periods, which are split into two quarters each.

Of course, we can define that hierarchy for monthly, daily, etc. time series. It should be quite evident in comparing Figures 1 and 2 that we can construct a summing matrix S for Figure 2 and produce coherent forecasts as needed. In this case, we achieve temporal coherency. That is, short-term lower level forecasts are aligned with long-term top-level forecasts. In practice, high-frequency short-term decision making is informed by long-term decision making and vice-versa. Temporal hierarchies offer substantial accuracy gains, due to seeing the time series from various aggregation viewpoints, hence capture both short- and long-term dynamics, but also mitigate the problem of model selection, as they naturally force the modeller to rely on multiple forecasts.

On the downside, although temporal hierarchies are a very handy statistical device for getting better quality forecasts, they do not always translate one-to-one with the relevant organisational decision making. For example, suppose that we model the daily demand of a particular SKU. Temporal hierarchies will be helpful in getting better forecasts. At the daily level and the levels close to it, decisions about stocking at stores, warehouses, etc. will be informed. As we get to the top levels, the forecasts may not relate directly to some decision. Do we need annual forecasts for a single SKU?

3.4. Cross-temporal hierarchies

The natural extension is to construct hierarchies that span both cross-sectional and temporal dimensions. Figure 3 illustrates how one could construct such a hierarchy. Each cross-sectional (blue) node, contains a temporal hierarchy (yellow). Here is where things start to become complicated. Expressing this hierarchy with a summing matrix S is not straightforward!

Figure 3. A cross-temporal hierarchy. Each cross-sectional node (blue), contains a temporal hierarchy (yellow).

With colleagues we have done some work in doing exactly that, only to realise that this needs a lot more thinking than just blindly adding columns and rows to S. For small hierarchies this may be feasible, but for large realistic ones, this becomes unmanageable very fast. This is work in progress, hopefully soon enough I will have something better to say on this!

Another approach is to do this sequentially. One could first do temporal and then cross-sectional, or the other way around. It appears that by first doing temporal, the forecasting exercise becomes simpler. However, the sequential application of the hierarchical machinery does not guarantee that forecasts will remain coherent across both dimensions. In fact, unless you have perfect forecasts it is easy to demonstrate that the second reconciliation will cause decoherence of the first. Figure 4 is helpful to understand this, but also to see the way forward.

Figure 4. Cross-sectional hierarchies on different temporal levels.

The cross-sectional hierarchy, from Figure 1, will remain applicable irrespective of the temporal level it is modelled at. Cross-sectional hierarchical forecasting merely chooses one of these levels and models everything there. Suppose we would model each of these cross-sectional hierarchies. The structure captured by S will stay the same, however G will most probably not, as it depends on the characteristics of the forecasts and the time series that change (i.e., the forecasting models/methods used and the resulting forecast errors), so at each temporal level it is reasonable to expect a different cross-sectional G. This is exactly what causes the problem. Not all G‘s can be true at the same time and ensure cross-temporal coherence.

The practical way forward is very simple, which improves forecast accuracy and imposes cross-temporal coherence:

  1. Produce temporal hierarchies forecasts for each time series of the cross-sectional hierarchy.
  2. Model the cross-sectional hierarchy at all temporal levels (these are reconciled temporally already).
  3. Collect all the different G‘s and calculate their element-wise average G*.
  4. Use the common G* to reconcile cross-sectionally, which by construction respects all temporal reconciliations.

The calculation is fairly trivial, although a large number of forecasts is required to be produced. Nowadays the latter is typically not an issue.

4. Does it work?

A recently published paper demonstrates the accuracy gains. Without going into too much detail, as one can find all that in the linked paper the key takeaways are:

  • The major forecast accuracy benefits come from temporal hierarchies.
  • Cross-sectional hierarchies still improve accuracy, but to a lesser extent.
  • The second hierarchical reconciliation is bound to offer fewer improvements, as the forecasts are already more accurate and therefore closer to the ideal ones that in principle are already coherent.
  • In our experiments, total gains were up to 10% accuracy. Obviously, this is dataset and setup dependent.

One may argue that this may be too much work for 10% accuracy. The strength of this argument depends on the quality of the base initial forecasts and also in the fact that the accuracy gains are a “by the way” benefit. The true purpose is to produce cross-sectionally coherent forecasts. These forecasts provide the same aligned view of the future across all dimensions, so they truly represent the one number forecast!

The forecasts are now aligned across:

  • Planning horizons: short/long, for high and low-frequency data (e.g., from hourly to yearly observations).
  • Planning units: SKUs to the most detailed level used, up to the total of the whole organisation.
  • The cross-temporal hierarchy respects the decision making needs: it provides detailed per SKU short-term high-frequency forecasts and aggregate long-term low-frequency forecasts. And these are coherent. No matter how you split or join forecasts together, they still agree.

The real benefit is that people can supplement different types of forecasts to the cross-temporal machinery. Back to the initial examples, the inventory side, the marketing side and the finance side keep on doing their work and provide their expert, model and information specific, views about the future. Crucially, this can be done in a modular fashion. An organisation does not have to go online with the whole construct simultaneously, but different functions can join step-by-step simply by revising the hierarchy to include that view.

In practice, one would use different type of models and inputs at the different parts of the cross-temporal hierarchy. For higher levels leading indicators, other regressors and expert judgment will be helpful. At lower levels, due to their size, univariate reliable forecasts, for example, based on exponential smoothing, potentially augmented by judgement, would be better suited.

5. The organisational benefits

An aligned view of the future across all levels/functions/horizons of an organisation comes with the apparent benefits for decision making. There are four more benefits that may not be apparent immediately:

  1. Break information silos the analytics way: it is not easy to change corporate structures, culture or human nature to improve communication between teams and functions. It is not easy to have colleagues who do not do forecasting for living to sit into long meetings about improving forecasts. The beauty of cross-temporal hierarchies is that forecasts can be produced independently and are subsequently weighted according to their quality. None of the views is discarded, but all are considered, with their different information base and distilled expert knowledge, to the single coherent forecast. Subsequently, information silos are softened as all functions and teams plan on a common blended view of the future.
  2. From strategising operations to informed strategies: the traditional managerial mantra is about how to operationalise strategies, i.e. how to take top-level decisions and vision about the future to the rest of the organisation. In principle that is all fine, if the top management had transparency of operations. A single page report, a line graph or a pie chart just doesn’t cut it! Cross-temporal hierarchies allow taking into consideration both top-down and bottom-up views, both short-term objectives/needs and long-term strategies/visions. This creates data transparency. Top management can generate a view of the future and then inform it with the rest of the organisational knowledge. Operations are closer to the customer. Marketing is shaping the customer. But neither operations or marketing have the bird-eye view of the board. And these are just examples.
  3. Ultra-fast decision making: welcome to a world where Artificial Intelligence decides for you. AI is not yet able to replace human decision making fully, but it is surely able to take care of many tedious decisions and do these at very large scales and very fast. It is only logical (obviously I had to use this phrase when talking about AI!) that we will see increasing use of AI to interact with customers at an increasingly high frequency. The scale can easily become of a level that it is impossible for human decision makers/planners/operators to supervise effectively. More importantly, if experts disengage from this, there is a good chance that the company will not be able to use all the expertise and experience in the (human) workforce. Cross-temporal hierarchies can help with that. AI will be able to take decisions and use data at ultra-fast frequencies. Humans do not need to follow that, as they can supplement with their views, knowledge and information with lower frequency decision making. Cross-temporal hierarchies will blend the two together, with AI adding additional levels to the hierarchical structure.
  4. Collaboration: thinking out of the box in a literal way. The cross-temporal machinery does not have to be restricted to a single organisation, but can encompass multiple. This way multiple units and stakeholders can share information and have a common view of the future. In the aforementioned paper, in the conclusions, we provide an example about the tourism sector, where hotel units, satellite companies and the state tourism board can all collaborate through a cross-temporal hierarchy.

Admittedly, each of the four points raised requires increasing analytics and corporate maturity. These are my views about how business will change, and my expectation is that this will happen rather quickly. Point 1 is apparent. Point 2 is necessary as employees become better skilled, better informed and better educated. If you want these people to remain part of your organisation, you can surely assume that top-down and traditional strategising operations will not be satisfactory. Point 3 is bound to happen, led by the large companies who already invest heavily in AI. But the interesting thing about AI is that its cost is reducing substantially and very fast, making it accessible to more and more organisations. Point 4 may be somewhat more contentious. What about competition between units and companies? My view is that collaborative existence is the only way forward for many small to medium size organisations, if they are to survive. How this is done, and what would be the involvement of larger players and the public is to be seen and surely a topic for a different discussion! This post is already too long!

Happy forecasting!

Cross-temporal coherent forecasts for Australian tourism

Nikolaos Kourentzes and George Athanasopoulos, 2019, Annals of Tourism Research, Vol 75, Pages 393-409. https://doi.org/10.1016/j.annals.2019.02.001

Key to ensuring a successful tourism sector is timely policy making and detailed planning. National policy formulation and strategic planning requires long-term forecasts at an aggregate level, while regional operational decisions require short-term forecasts, relevant to local tourism operators. For aligned decisions at all levels, supporting forecasts must be `coherent’, that is they should add up appropriately, across relevant demarcations (e.g., geographical divisions or market segments) and also across time. We propose an approach for generating coherent forecasts across both cross-sections and planning horizons for Australia. This results in significant improvements in forecast accuracy with substantial decision making benefits. Coherent forecasts help break intra- and inter-organisational information and planning silos, in a data driven fashion, blending information from different sources.

Download paper.