This is a modified version of the paper that appears in Foresight issue 48. This provides a simplified version of the modelling methodology described in this paper and applied here and here.

**Introduction**

Using leading indicators for business forecasting has been relatively rare, partly because our traditional time series methods do not readily allow incorporation of external variables. Nowadays however, we have an abundance of potentially useful indicators, and there is evidence that utilizing relevant ones in a forecasting model can significantly improve forecast accuracy and transparency.

**What are leading indicators?**

We can define a *leading indicator* as a numerical variable that contains predictive information for our target variable (e.g., sales) at least as many periods in advance as the forecast lead time. There are several aspects to this definition.

- First, the indicator should be a
*hard*variable; that is, recorded in a useful way for inputting into a statistical model. This means it should have adequate history and be available at the required frequency, after any appropriate transformations (for example, aggregating data into the desired frequency). - Second, the indicator must contain genuine predictive information, not spurious statistical correlation with the target.
- Third, the indicator must lead the target variable by enough time to be operationally useful. If for example we have a leading indicator that is very informative one month in advance of sales, but we need three-month ahead sales forecast for ordering supplies and materials, this indicator lacks sufficient lead time. (If you are still thinking that it is useful for improving your forecasts, then your forecast lead time is not three but one month – a misjudgment we see all too often!) Note that sometimes it is possible to produce (or source externally)
*forecasts of indicators*, but then poor forecasts here indicators will be damaging! A detailed discussion of the so-called*conditional*(i.e. using information only up to the time of forecast) and*uncoditional forecasting*can be found in Principles of Business Forecasting, where alternative forecasting strategies are discussed.

Sources of potential indicators are governmental, banking and private-sector data on macroeconomic, internet search data, social media postings, and an often overlooked source, the company itself. Strategic, marketing on other company plans, can be useful for constructing leading indicators.

**Forecasting with leading indicators**

Suppose we have identified a useful leading indicator, how do we use it for forecasting? In the simplest setting, where the target responds linearly to movement in the indicator, we can construct a forecast as Equation (1):

(1)

where is the forecast for period *t+h*, constructed at period *t* (now!), represents the time series components – level, trend, and seasonality – of the target at period *t+h* exclusive of the effect of the leading indicator, and *c* is the coefficient on a leading indicator . In a regression context, could comprise a constant, a trend variable, a set of seasonal dummies, or lagged values of the target.

Observe that we use index *t* for , since we require its value to be available at the time we construct the forecast for at least *h* periods ahead. If the lead were more than *h*, we would require a forecast for the indicator *X*. which may introduce more forecast error. Additionally. we might be able to procure forecasts of *X* from an external source.

Equation (1) can be extended to include many indicators (or lags of the same indicator) in the same fashion as we would include more variables in a multiple regression.

**Identifying leading indicators: Lasso regression**

So far, we have assumed that we know a priori which indicators are useful. Realistically, this is not the case – so we need an identification strategy for selecting among potentially numerous potential indicators.

Let’s start by restricting our search to macroeconomic leading indicators, which can readily be sourced from a national statistics bureau. There are thousands, maybe tens of thousands of these that could have predictive information. On top of this, we need to consider various lags of the indicators to identify the most informative lead. For example, is it housing sold three or four months in advance that is most informative for our sales? This can quickly bring the number of potential indicators to several hundreds of thousands.

In fact, the number of potential indicator variables *k *can be larger than the number of available historical observations in our sample *n*, which makes it impossible to estimate or evaluate a model and rules out classical regression search strategies, such as stepwise regression.

**The lasso regression procedure**

A promising solution to this problem comes from the technique called *Least Absolute Shrinkage and Selection Operator *(Lasso) regression. Lasso was introduced by Robert Tibshirani (1996) and since has become a popular tool in many disciplines. (See Ord et al., 2017, section 9.5.2 for a barebone introduction and Hastie et al., 2015 for a thorough discussion) Lasso regression has the major advantage that it can still select and estimate a model when the number of potential indicator variables is greater than the number of historical observations, *k>n*.

To estimate the model coefficients in standard regression we minimise the Sum of Squared Error (SSE):

(2)

where is the actual observation at time *t* and is the forecast. Suppose that the forecast is made up by a constant and *k* explanatory variables . For the sake of simplicity, we will drop any lags from the notation, and consider any lagged input as different inputs *X*. Given that , (2) becomes:

, (3)

where is the constant and the coefficient on indicator .

Equation (3) is called the cost function or loss function of the regression. The regression method finds the values for and that minimize the cost function. When a coefficient is insignificantly different from zero, we typically remove that variable from the regression, achieving variable selection.

If *k>n*, there is no unique solution for the *c* coefficients, which renders the model useless. This is because we allow full flexibility to the coefficients, which can take on any values. To overcome the problem, we need to impose restrictions on the coefficients. Lasso does so by modifying Equation (3) so that it includes a penalty for model complexity, as measured by the number of regressors in the model with non-zero coefficients. If we think of Equation (3) as “fitting error” we can conceptually write

*Error = (Fitting Error) + λ(Penalty)*. (4)

The reader may realise that Equation (4) appears quite often in time series modelling. Information criteria, such as Akaike’s Information Criterion (AIC) have a similar structure, although a very different underlying logic and derivation. Information criteria seek to mitigate potential overfitting of a model, by imposing penalties/restrictions to the model fit.

In Lasso, penalty is , where is the coefficient of the standardised variable . The intuition behind this penalty is the following: the more non-zero coefficients (i.e. included variables) the bigger the penalty becomes. Therefore, for equation (4) to be minimised, the penalty must become small as well, which forces variables out of the model. Standardization puts all variables on the same scale, preventing larger-valued variables from dominating just because of their scale rather than their importance in explaining the variation of the target variable.

We can now write the cost function of Lasso as:

, (5)

where both actuals and input variables have been standardised, denoted by the prime indicator (‘).

**Determining λ**

Lasso attempts to minimise Equation (5). If Equation (5) reduces to Equation (3), the conventional regression solution. As *λ* increases above zero, the penalty becomes increasingly dominant, which pushes the *c* coefficients toward zero (hence the name “shrinkage”). Those variables with zero coefficients will be removed from the model. Intuitively, it is the coefficients on the least important variables that are first pushed to zero, leaving the more important variables in the model to be estimated. (“Important” does not mean causal, but merely that it explains some of the variance of the historical data.)

So Lasso will select variables— those that remain— and estimate the values of their coefficients. This is true even when *k>n*.

A “side-effect” of Lasso regression is that we no longer receive statistical-significance values, as we do with conventional regression. This is so because Equation (5) violates important assumptions underlying the regression model.

Figure 1 illustrates how the coefficients of ten variables – each distinguished by number and color –shrink as increases (from left to right), until eventually all are removed from the model.

The coefficient values on the very left of the plot are almost identical to the standard regression solution (*λ* approximatelly zero). This changes as *λ* increases. For example, imagine a vertical line at *λ=0.05*. It’s intersection with the lines in the plot reveal the coefficients’ values (on the vertical axis). Some are already equal to zero: those numbered Similarly, if we look at the coefficients for *λ=0.15*, only one coefficient (number 9) remains non-zero.

So the choice of value for *λ* is very important for the performance of Lasso. But how is *λ* determined? The standard procedure is through *cross-validation*: we separate the historical sample into various subsamples, calculating the *cross-validated mean squared error* across all subsamples. We then look for *λ* values that minimises this error. Actually, we often chose a slightly larger *λ* to avoid overfitting to the cross-validation itself. Figure 2 visualises the result of this process. Here, we ran Lasso regressions for various values of *λ* – plotted on the horizontal axis – and recorded the cross-validated errors. The right-most vertical line indicates the selected (larger) *λ*, while the vertical line to its left indicates the minimum cross-validated error.

With *λ* determined, Lasso proceeds to estimate the coefficients of the remaining variables and at the same time drop any variables accordingly.

**A case study**

Sagaert et al. (2018) showed how Lasso was applied to identify macroeconomic leading indicators for forecasting a tire manufacturer’s sales at a regional level. More specifically, our focus was on forecasting monthly sales for the raw materials in two types of end products for the regions of EU and the US.

The current company practice uses Holt-Winters exponential smoothing to forecast long term sales based on estimates of the trend and seasonal patterns. Management believed however that there are key economic activity indicators that could yield predictive information. For instance, net transported goods can be a good indicator for tire sales. As more goods are transported, more trucks are used, leading eventually to a higher demand for tires. Obviously, there is a lag between the ramp up of the number of transported goods and need for tires. Therefore, the transported good variable could be an useful leading indicator.

But there are possibly numerous other leading indicators for this company. Using publicly available macroeconomic indicators for multiple countries, from the St. Louis Federal Reserve Economic Data, we collected a set of 67,851 potential indicators. These covered a wide variety of categories: consumption, feedstock, financial, housing, import/export, industrial, labour and transportation.

For each indicator, we considered lags from 1 to 12 months, increasing the number of potential indicators to 814,212! At this point we could choose either to build a fully automatic forecasting model, where the selection of leading indicators is based solely on the selection made by the Lasso, or take advantage of managerial knowledge as well.

We interviewed the management team with the aim of identifying relevant subgroups of indicators. This exercise led to a subgroup of 1,082 indicators from the original 67,851. As the objective was to produce forecasts for the next 12 months, a different number of indicators was usable for forecasting 1-step ahead to 12-steps ahead. For one step ahead all 12,984 (all 12 lags) were used. In contrast for forecasting 12-steps ahead, only 1,082 indicators were available (only those with a lag of 12 months, as shorter lagging values would need to be forecasted). Note that we also tried a fully statistical model, built using all indicators with no managerial information. That model performed slightly worse that the one enhanced by expert knowledge and therefore is not discussed hereafter.

We built 12 different Lasso regressions —one for each forecast horizon— using a different number of leading indicators, as shown in Figure 3. These regressions were further enhanced by including seasonal effects and potential lags of sales, all of which were selected automatically by the Lasso regression. The final forecasting models included only a selection of the available input variables.

Figure 3 illustrates, for each forecasting horizon, the number of available indicators and the average number of indicators selected. The most informative indicators selected turned out to be: employment in automobile dealerships, the number of national passenger car registrations and the consumer price index for solid fuel prices, all logical choices for forecasting tire sales.

We constructed the Lasso forecasts using the excellent *glmnet* package for R, whose functions allow us to quickly build Lasso models, to select the optimal value of *λ* and to generate the forecasts.

Figure 4 provides the overall Mean Absolute Percentage Error (MAPE) across four time series and up to 12 horizons for tire sales. *Company* is the current company forecast based on Holt-Winters exponential smoothing. *ETS* is a more general exponential smoothing approach that considers all versions of level, trend and/or seasonal forms of exponential smoothing. The appropriate form is chosen automatically using the information criterion,AIC. *Lasso *is our Lasso regression based on the managerial-selected subset of indicators.

The MAPEs are 18.6% for *Company,* 15.3% for *ETS* and 15.1% for *Lasso.*

- The
*Company*forecast is easily outperformed by ETS, with the exception of very short horizons (1 to 3-steps ahead). So, even without consideration of leading indicators, the company could have improved its forecasting performance by using the exponential smoothing family of models (ETS) that embeds the advancements made in the last 15 years in terms of model forms, parameter estimation, and model selection. - Over short to mid-term horizons (up to 6-steps ahead) Lasso offered substantial gains over ETS. At longer horizons Lasso remains competitive to ETS up to 10-steps ahead but fell short at still longer horizons. For short-term forecasts we allowed Lasso to look for short and long leading effects. On the other hand, for long-term forecasts Lasso could look only for long-lead effects, which will be fewer and weaker. For this particular case there was very limited predictive information for leading indicators with leads of 11 or 12 months. At that forecast lead time, information on trend and seasonality was the most important, which was captured effectively by exponential smoothing (ETS).
- Overall, Lasso proved to be the most accurate forecast method, although the accuracy gain was small compared to ETS. This has significant implications for forecasting practice: ETS contains no leading indicators and can be implemented automatically whereas Lasso considers volumes of data at the planning level.

This approach was also found to translate to inventory benefits. You can find more details here.

**Is the use of leading indicators the way forward?**

Our study shows that while there is merit in including leading indicators via a Lasso modelling framework for business forecasting, especially with managerial guidance, the practical usefulness of leading indicators is limited to shorter horizons. At very long horizons chances are that there is only univariate information (trend and seasonality) left to model. Appropriately used extrapolative forecasting models remain very important, even for tactical level forecasting.

We certainly now have the computational power to make the use of leading indicators. But at what level of aggregation should they be factored in. In this case study, we modelled aggregate sales series for which macroeconomic effects were expected to be important. At a disaggregate level, how reasonable is it to follow our approach?

Our team has not found evidence that it is worthwhile to build such complex models at the disaggregate level and that univariate forecasts are just as good. This is intuitive, as any macro-effects are lost in the noisy sales of the disaggregate level (for example per SKU sales). Nonetheless, that does not preclude constructing hierarchical forecasts, where at the top levels, macroeconomic indicators can add value, enhancing disaggregate level ETS forecasts.

We used Lasso regression here as our core modelling. Its substantial advantage over other modelling approaches that attempt to condense many potential variables into fewer construct (principal components analysis, or more generally dynamic factor models) is that Lasso outputs the impact of each selected variable and is transparent to the users. It therefore offers greater business insights for managerial adjustments to the forecasts and the eventual acceptance of the forecasts in the organisation.

In other experiments, our team used leading indicators sourced from online search habits of consumers and found that, even if there is a connection, this connection does not manifest itself in the required forecast horizons (Schaer et al., 2018), limiting their value of online indicators for sales forecasting.

So, while there are gains to be made from using leading indicators, we should not be tempted to overly rely on them, when simpler univariate forecasts can do nearly as well. . On the other hand, leading indicators may be able to follow these dynamics and provide crucial forecast accuracy gains. At the end of the day, a model enriched with leading indicators has to make sense!

]]>Tactical capacity planning relies on future estimates of demand for the mid- to long-term. On these forecast horizons there is increased uncertainty that the analysts face. To this purpose, we incorporate macroeconomic variables into microeconomic demand forecasting. Forecast accuracy metrics, which are typically used to assess improvements in predictions, are proxies of the real decision associated costs. However, measuring the direct impact on decisions is preferable. In this paper, we examine the capacity planning decision at plant level of a manufacturer. Through an inventory simulation setup, we evaluate the gains of incorporating external macroeconomic information in the forecasts, directly, in terms of achieving target service levels and inventory performance. Furthermore, we provide an approach to indicate capacity alerts, which can serve as input for global capacity pooling decisions. Our work has two main contributions. First, we demonstrate the added value of leading indicator information in forecasting models, when evaluated directly on capacity planning. Second, we provide additional evidence that traditional metrics of forecast accuracy exhibit weak connection with the real decision costs, in particular for capacity planning. We propose a more realistic assessment of the forecast quality by evaluating both the first and second moment of the forecast distribution. We discuss implications for practice, in particular given the typical over-reliance on forecast accuracy metrics for choosing the appropriate forecasting model.

Download paper.

]]>In this paper, we explored how judgment can be used to improve the selection of a forecasting model. We compared the performance of judgmental model selection against a standard algorithm based on information criteria. We also examined the efficacy of a judgmental model-build approach, in which experts were asked to decide on the existence of the structural components (trend and seasonality) of the time series instead of directly selecting a model from a choice set. Our behavioral study used data from almost 700 participants, including forecasting practitioners. The results from our experiment suggest that selecting models judgmentally results in performance that is on par, if not better, to that of algorithmic selection. Further, judgmental model selection helps to avoid the worst models more frequently compared to algorithmic selection. Finally, a simple combination of the statistical and judgmental selections and judgmental aggregation significantly outperform both statistical and judgmental selections.

Download paper.

]]>The safety stock calculation requires a measure of the forecast error uncertainty. Such errors are usually assumed Gaussian iid (independent, identically distributed). However, deviations from iid deteriorate the supply chain performance. Recent research has shown that, alternatively to theoretical approaches, empirical techniques that do not rely on the aforementioned assumptions, can enhance the safety stock calculation. Particularly, GARCH models cope with time-varying heterocedastic forecast error, and Kernel Density Estimation do not need to rely on a determined distribution. However, if forecast errors are both time-varying heterocedastic and do not follow a determined distribution, the previous approaches are inadequate. To overcome this, we propose an optimal combination of the empirical methods that minimizes the asymmetric piecewise linear loss function, also known as tick loss. The results show that combining quantile forecasts yields safety stocks with a lower cost. The methodology is illustrated with simulations and real data experiments for different lead times.

Download paper.

]]>Forecast selection and combination are regarded as two competing alternatives. In the literature there is substantial evidence that forecast combination is beneficial, in terms of reducing the forecast errors, as well as mitigating modelling uncertainty as we are not forced to choose a single model. However, whether all forecasts to be combined are appropriate, or not, is typically overlooked and various weighting schemes have been proposed to lessen the impact of inappropriate forecasts. We argue that selecting a reasonable pool of forecasts is fundamental in the modelling process and in this context both forecast selection and combination can be seen as two extreme pools of forecasts. We evaluate forecast pooling approaches and find them beneficial in terms of forecast accuracy. We propose a heuristic to automatically identify forecast pools, irrespective of their source or the performance criteria, and demonstrate that in various conditions it performs at least as good as alternative pools that require additional modelling decisions and better than selection or combination.

Download paper.

]]>Supply chain risk management has drawn the attention of practitioners and academics alike. One source of risk is demand uncertainty. Demand forecasting and safety stock levels are employed to address this risk. Most previous work has focused on point demand forecasting, given that the forecast errors satisfy the typical normal i.i.d. assumption. However, the real demand for products is difficult to forecast accurately, which means that—at minimum—the i.i.d. assumption should be questioned. This work analyzes the effects of possible deviations from the i.i.d. assumption and proposes empirical methods based on kernel density estimation (non-parametric) and GARCH(1,1) models (parametric), among others, for computing the safety stock levels. The results suggest that for shorter lead times, the normality deviation is more important, and kernel density estimation is most suitable. By contrast, for longer lead times, GARCH models are more appropriate because the autocorrelation of the variance of the forecast errors is the most important deviation. In fact, even when no autocorrelation is present in the original demand, such autocorrelation can be present as a consequence of the overlapping process used to compute the lead time forecasts and the uncertainties arising in the estimation of the parameters of the forecasting model. Improvements are shown in terms of cycle service level, inventory investment and backorder volume. Simulations and real demand data from a manufacturer are used to illustrate our methodology.

Download paper.

]]>In doing forecast selection or combination we typically rely on some performance metric. For example, that could be Akaike Information Criterion or some cross-validated accuracy measure. From these we can either pick the top performer, or construct combination weights. There is ample empirical evidence demonstrating the appropriateness of such metrics, both in terms of resulting forecast accuracy and automation of the forecasting process. Yet, these performance metrics are summary statistics, that do not reflect higher moments of the metrics. This poses similar issues to analysing only point forecasts to assess the risks associated with a prediction, instead of looking at prediction intervals as well. Looking at summary statistics does not reflect the uncertainty in the ranking of alternative forecasts, and therefore the uncertainty in selection and combination of forecasts. We propose a modification in the use of the AIC and an associated procedure for selecting a single forecast or constructing combination weights that aims to go beyond the use of summary statistics to characterise each forecast. We demonstrate that our approach does not require an arbitrary dichotomy between forecast selection, combination or pooling, and switches appropriately depending on the time series on hand and the pool of forecasts considered. The performance of the approach is evaluated empirically on a large number of real time series from various sources.

Download slides.

]]>Many supply chains experience the Bullwhip effect, defined as the upstream amplification of demand variability. This information distortion results in a misalignment of forecasts, generating expensive business costs. A proposed remedy in the literature is the sharing of Point of Sales information data among the members of the supply chain. The theoretical and empirical results have pointed in different directions, with the empirical evidence suggesting that information sharing helps achieve better forecasting accuracy. A less studied facet of the Bullwhip is the effect of promotions on it, which was highlighted as one of its four original sources. This research is dedicated to examining the effect of promotions and other demand shocks on the performance of the different tiers of the supply chain. In particular, it will study the impact of promotions on forecasting accuracy, Bullwhip propagation and safety stocks for the participants of the supply chain. Furthermore, it will also investigate the impact of different types of information sharing in this context on the Supply Chain, and compare their performance in terms of gains in forecasting accuracy.

Download slides.

]]>Recently, there has been substantial research on augmenting aggregate forecasts with individual consumer data from internet platforms, such as search traffic or social network shares. Although the majority of studies report increased accuracy, many exhibit design weaknesses including lack of adequate benchmarks or rigorous evaluation. Furthermore, their usefulness over the product life-cycle has not been investigated, which may change, as initially, consumers may search for pre-purchase information, but later for after-sales support. In this study, we first review the relevant literature and then attempt to support the key findings using two forecasting case studies. Our findings are in stark contrast to the literature, and we find that established univariate forecasting benchmarks, such as exponential smoothing, consistently perform better than when online information is included. Our research underlines the need for thorough forecast evaluation and argues that online platform data may be of limited use for supporting operational decisions.

Download paper and online supplement.

]]>Supply chain management is increasingly performed at a global level. The decision process is often based on tactical sales forecasts, which has been shown to benefit from including relevant exogenous information. Leading indicators that cover different aspects of macroeconomic dynamics are appealing in this context, as macroeconomic dynamics in target countries can affect companies end markets. Even though this information can be beneficial on a tactical level, it remains unclear how this information can impact sales forecasts at Stock-Keeping-Unit (SKU) product level, due to increased levels of noise and products having differing demand patterns and dynamics, masking macro-effects. Nonetheless, hierarchical forecasting can be used to reconcile macroeconomic leading indicators from tactical level forecasts to detailed SKU levels, and vice versa. In this paper, we evaluate the feasibility and benefits of merging tactical and operational forecasting, where higher level forecasts include leading indicators, in contrast to univariate SKU operational predications. We present a framework that identifies automatically the most relevant leading indicators on global sales level, and by exploiting the hierarchical product structure, carries this information to sales forecasts at SKU product level. For our evaluation we rely on inventory metrics obtained from simulation experiments, reflecting the associated supply chain risk.

Download slides.

]]>