Author Archives: Nikos

Workshop on `Forecasting with R’

Fotios Petropoulos and I will be giving a workshop on how to produce forecasts using R. The focus will be on business forecasting applications and the workshop is part of the workshop series that runs as part of the 36th International Symposium on Forecasting on the 19th of June 2016.

This workshop will provide a full-day hands-on demonstration of R statistical software as a forecasting tool. Assuming basic knowledge over the basic forecasting methods (exponential smoothing, ARIMA, single and multiple regression models) we will demonstrate how to use R to produce forecasts with a wide range of methods and models. We will follow a step-by-step approach from data input, modelling up to forecast evaluation and export of results. On top of basic methods, we will show how to easily apply in practice more sophisticated forecasting, such as intermittent demand methods and cross-sectional and temporal hierarchical forecasting. We will go through the basic functions of major forecasting related R packages (such as forecast, tsintermittent and other specialised packages). Using hands-on exercises and real-world data, participants will learn to model time series, produce forecasts, visualise and evaluate results.

The workshop is tailored for both beginners and intermediate R users and it will be structured in five sessions:

  1. Introduction to R and general use functions for forecasting.
  2. Forecasting for fast demand (Exponential smoothing and ARIMA)
  3. Forecasting with causal methods (Regression and variable selection)
  4. Forecasting for intermittent demand (Croston, SBA and classification schemes)
  5. Advanced methods in forecasting (Hierarchical forecasting, temporal aggregation and MAPA, …)

More details how to participate to the workshop are provided at the ISF website.

A fundamental idea in extrapolative forecasting

Extrapolative forecasting, using models such as exponential smoothing, is arguably not very complicated from a mathematical point of view, but it requires a shift in logic in terms of what is a good forecast. For this discussion I will use a simple form of exponential smoothing to demontrate my point.

1. The forecasting model: single exponential smoothing

The forecast is calculated as:

F_{t+1} = \alpha A_t + (1-\alpha) F_t,

where F_t is the forecast and A_t is the actual historical value for period t. The smoothing parameter \alpha is a value between 0 and 1. In its simplest interpretation this can be seen as a weighted moving average, where the distribution of weight is controlled by the smoothing parameter. Without going into too much detail we can say the following:

  • A low value for \alpha results in a long weighted moving average and in turn in a very smooth model fit;
  • A high value results in a short weighted moving average that update the forecast very fast according to the most recent actual values of the time series.

For example consider the case when \alpha = 1, the forecast equation becomes: F_{t+1} = A_t , which is the same as the Naive method and in practice makes the forecast equal to the last observed value. No older observations are considered and no smoothing occurs. If we want to be proper with our model, then neither values of 0 or 1 are allowed for the smoothing parameter, but the above example is quite illustrative and therefore useful.

2. Forecasting sales

Now that the basics of the model are explained let us look at the following example. Let us assume we have to forecast sales of a product with two years of history and we have two alternative model fits, one with smoothing parameter equal to 0.1 and one equal to 0.9.

fit1

Fig. 1. Model fit with parameter 0.1 and 0.9.

The key question is: which of the two alternatives is the best for forecasting future sales? This is a question I get quite often by practitioners and students in various forms.

The typical answer I get from people who are not trained in forecasting/statistics is that the option with parameter 0.9 is best. It indeed seems to follow the shape of past sales quite closely and arguably if we could somehow shift it to the left by one period the fit would be fantastic. Whereas on the other hand the fit based on parameter 0.1 is a flat line that does not follow the observed historic sales.

This is a reasonable argument, but unfortunately it is wrong. In fact I mislead you so far, because in the equation for single exponential smoothing I did not include the error term and therefore we focused on comparing the actual sales and the point forecast. The point forecast is simply the most probable value in the future, but it is not the only possible one! Every forecast assumes some error, as there is always unaccounted information (what we typically call noise). We should really think every forecasted value as a distribution of values, with the most probable being the point forecast. Fig. 2 illustrates this by providing both the point forecast as well as the 80% and 90% Prediction Intervals (PI), i.e. the areas within which the future actual is expected to be with 80% and 90% confidence.

frc1

Fig. 2. A period ahead forecast with prediction intervals.

Observe that as we look for higher confidence (from 80% to 90%) the PIs become wider. That already tells you something about how much confidence we have with regards to the accuracy of the point forecast!

3. Prediction Intervals

The PIs are connected to the error variance of the forecast, which for unbiased forecasts is simply the Root Mean Squared Error (RMSE):

 \text{RMSE} = \sqrt{\frac{1}{n}{\sum_{i=1}^{n}{(A_i-F_i)^2}}}

where n is the number of errors between historical and fitted values. This is also related to why we typically optimise forecasting models on squared errors: we are trying to minimise the error variance. So the smaller the RMSE the tighter are the PIs, which if our statistics are correct implies we have more confidence in our forecast. Obviously this is connected with any decisions that we may take using these forecasts, such as inventory decisions. The PIs and the safety stock are connected. For a more in-depth and relevant discussion on the connection to safety stock, as well as the problem of biased forecasts read this.

For our example forecasts we have the following RMSE:

  • Parameter 0.1: RMSE = 35.9
  • Parameter 0.9: RMSE = 40.4

which already informs us that we have less certainty on the predictions that are based on the smoothing parameter 0.9 that tries to follow the pattern of sales “better”. Fig 3 illustrates this for the in-sample fit for both cases.

fit2

Fig. 3. Fit to historical sales with 80% and 90% one-step ahead prediction intervals.

There are a few things we can say about Fig. 3. First consider the plot for parameter 0.1. You can see that most historical sales are within the 90% prediction interval, as the name suggests. The 80% prediction interval does a decent job as well. On the other hand the fitted value (the point forecast) does not follow the sales pattern and if we would consider this as the only indication of a good forecast, we would reject it. Compare this with the plot for parameter 0.9. Now things are much more erratic. The prediction intervals are more risky, in the sense that more points are outside or just marginally inside, even though the intervals themselves are wider. The fitted values still do not fare better in being close to the historical sales (for each month!).

Consider another aspect of this, suppose that you need to take some decision on the forecasts. The “smooth” one based on the low parameter provides a very stable forecast and PIs. So for example running an inventory at 90% service level, more or less implies having to meet a demand & safety stock looking similar to the 90% PI. Now consider taking the same decision using the predictions based on parameter 0.9. You will need to revise the plan all the time, as the predictions and respective PIs are very volatile.

The true forecasts are even more striking, as it is shown in Fig. 4. For parameter 0.1 the prediction intervals are tighter, implying more confidence in our predictions, but also less costly decisions, such as lower safety stock. On the other hand the second forecast requires much wider prediction intervals, more uncertainty and the forecast does not look more reasonable! In both cases we get a flat line of forecasts, as this is what single exponential smoothing is capable of producing as a forecast. The impression that is was doing something more was misleading. Observe that as we are calculating the PIs for multiple steps ahead these typically become wider. Again, consider the cost implications.

frc2

Fig. 4. Forecasts with 80% and 90% prediction intervals.

4. A fundamental idea

Real time series contain noise, which cannot be forecasted. Therefore our objective should be to capture only the underlying overall structure and not the specific patterns that are most likely due to noise. Think what your forecasting model (or method) is capable of capturing and try to do only that. Single exponential smoothing is only capable of capturing the “level” of the time series. It is incapable of capturing trend, seasonality or special events and we should not abuse the model to do so. This will only result in very uncertain predictions, with substantial cost implications that only look more “reasonable” if we consider only the point forecasts. Once the PIs are calculated it becomes clear that we are making life much harder for us.

Long story short:

  • The point forecast will be (always) wrong, so instead one should look at prediction intervals.
  • Consider your model and try to fit to the structure it is able to do; do not be tempted to “explain” everything with your model. The latter will make forecasts to follow around noise, resulting in poor PIs and expensive decisions.

Obviously more complex models are able to capture more patterns and details from a time series, but again that in itself may just lead to over-fitting, a topic I will not go into in this post!

I hope this illustration helps explain why we should not try to make our extrapolative forecasts match the historical patterns fully, and switch from thinking about point forecasts to distributions, as conveyed by the prediction intervals.

A final note: my intention was not to be exact with my statistics, but rather illustrate a point! There is much more to be said about PIs, parameter and model selection and so on.

You may find it helpful to experiment with different exponential smoothing models and parameters and PIs using this interactive demo.

Distributions of forecasting errors of forecast combinations: implications for inventory management

D. Barrow and N. Kourentzes, 2016, International Journal of Production Economics, 177: 24-33. http://dx.doi.org/10.1016/j.ijpe.2016.03.017

Inventory control systems rely on accurate and robust forecasts of future demand to support decisions such as setting of safety stocks. Combining forecasts is shown to be effective not only in reducing forecast errors, but also in being less sensitive to limitations of a single model. Research on forecast combination has primarily focused on improving accuracy, largely ignoring the overall shape and distribution of forecast errors. Nonetheless, these are essential for managing the level of aversion to risk and uncertainty for companies. This study examines the forecast error distributions of base and combination forecasts and their implications for inventory performance. It explores whether forecast combinations transform the forecast error distribution towards desired properties for safety stock calculations, typically based on the assumption of normally distributed errors and unbiased forecasts. In addition, it considers the similarity between in- and out-of-sample characteristics of such errors and the impact of different lead times. The effects of established combination methods are explored empirically using a representative set of forecasting methods and a dataset of 229 weekly demand series from a household and personal care leading UK manufacturer. Findings suggest that forecast combinations make the in- and out-of-sample behaviour more consistent, requiring less safety stock on average than base forecasts. Furthermore we find that using in-sample empirical error distributions of combined forecasts approximates well the out-of-sample ones, in contrast to base forecasts.

Download paper.

How to fit an elephant?

I was looking for an intuitive way to demonstrate to my students the need for parsimony in model building, as well as the problem of overfitting and I remembered the humorous paper by James Wel: showing that elephants are obviously created by Fourier sine series! I went a step further and implemented some popular selection methods and interpolation. It is interesting to see how the different selection methods perform, given different number of sines and how they (over-/under-) fit, when asked to interpolate – in a predictive modelling spirit.

ISIR 2016: Special session on `Estimating Demand Uncertainty’

I will be hosting a special session on Estimating Demand Uncertainty in the upcoming International Society for Inventory Research conference ISIR2016.

The link between forecasting and inventory calculations is often not as clear as it should be. This is evident by the formulas used for the safety stock calculation: the expected demand and its uncertainty are often based on very strong assumptions (stationarity, normality, i.i.d errors, etc) and are not valid when looking at the empirical results from real world forecast. The focus of this session will be to progress this discussion by looking how forecasts and their errors should be best translated to support inventory decisions.

Particular focus will be on:

  1. the estimation expected demand, demand uncertainty and its distributional aspects, attempting to answer the question how to update the formuli to reflect non-independent errors, which occur by construction in our forecasts.
  2. update our formuli to remove unrealistically strong assumptions, so as to reflect reality and support practice. The main interest is to do so in a way that can be transfered to practice.

The abstract submission deadline is on the 31st of March 2016. A selection of submitted works will be published in a special issue of the International Journal of Production Economics. For more details see the submission information at the conference website. When submitting your abstract you will be able to select the special session there.

Please contact me if you are interested or have any more questions, or leave a comment below.

Forecasting competition: Computational Intelligence in Forecasting

There is a new forecasting competition announced, the International Time Series Forecasting Competition “Computational Intelligence in Forecasting” CIF 2016. The competition is organised by Martin Stepnicka and Michal Burda within IEEE WCCI 2016 congress and it is related to a special session IJCNN-13 Advances in Computational Intelligence for Applied Time Series Forecasting (ACIATSF).

For more details on the competition, please visit the competition web sites: http://irafm.osu.cz/cif

IIF Workshop on Supply Chain Forecasting for Operations

Lancaster Centre for Forecasting and Cardiff Business School, in collaboration with the IIF and SAS, are hosting a workshop on “Supply Chain Forecasting for Operations” on 28 and 29 June 2016 at Lancaster University, UK. The workshop is intended to stimulate forecasting research in the area of supply chain operations and selected papers will be reviewed for publication in a special issue of the International Journal of Forecasting.

If you are interested in presenting a paper at the workshop, please send an abstract for consideration to John Boylan (j.boylan@lancaster.ac.uk). The workshop is intended to accommodate around 30-35 people and will be by invitation, so we can ensure that a suitable range of topics are included in the programme. There will be limited funds to support attendance, particularly doctoral students. We hope this event will help to generate new research in an a critically important area that has not had the attention it deserves since the days of Bob Brown and Charles Holt!

You can find more information here.

Interactive simple exponential smoothing

Another interactive demo I created for the courses I teach. This one is about simple exponential smoothing and the main objective is to show the interaction between smoothing parameter and initial level in the fitting and holdout samples. A different interactive demo about exponential smoothing can be found here. A couple of points that may be interesting to observe:

  • In-sample and holdout errors do not behave in the same way.
  • The error curves can be substantially different for various initial level values.

RShiny demo for basic time series exploration

I made this little interactive demo for basic time series exploration and decomposition for my students. I uploaded it here in case someone else finds it useful. Some things to try:

  • How does the seasonal plot looks like for seasonal and non-seasonal time series?
  • How does the seasonal plot looks when the trend is not removed?
  • How does the time series decomposition looks like when additive or multiplicative form is applied wrongly?
  • What is the effect of different seasonality estimations when decomposing?

Different types of seasonal plots are implemented as well. I find that sometimes looking at alternative forms is quite helpful.

Another update for tsintermittent

Version 1.8 of tsintermittent has been submitted to CRAN and should be shortly available for download. Amongst various new checks on inputs to better accommodate handling multiple time series with data frames, a new option has been added to data.frc. When method="auto" two things will happen:

  1. Function idclass(...,type="PKa") will be called to classify the time series and select for each one the appropriate forecasting method between Croston’s method, SBA and single exponential smoothing (for details on the classification see the documentation of idclass).
  2. Each time series will be forecasted using the selected forecasting method. Any parameters are optimised per time series.

Some things to keep in mind. The function data.frc can accept additional inputs that are passed to the forecasting method used. The function is smart enough to distribute options that are only available to crost or sexsm appropriately. Also, you will get the same results if you use:
data.frc(...,method="imapa",maximumAL=1)
in which case imapa is restricted to using only the original temporal aggregation level. However calling method="imapa" instead of method="auto" is substantially slower, so the latter is recommended when handling multiple time series and you do not need to take advantage of temporal aggregation.

This paper empirically demonstrates that using a similar classification to select the best method for intermittent demand results in good forecasting performance. Although the good performance of the classification scheme was verified again in this paper, we also found that imapa gave the most accurate forecasts. Nonetheless, the new option should allow to quickly implement either approach. My personal view is that the method selection issue for intermittent demand time series is far from resolved, as I demonstrate in this paper, but good progress is being done and should be used in practice.