Author Archives: Nikos

Ensemble size and combination operators

Combining forecasts has been shown in many cases to lead to improvements in forecasting performance, in terms of accuracy and bias. This is also common in forecasting with neural networks or other computationally intensive methods, where ensemble forecasts are considered more accurate than individual model forecasts. A useful feature of forecast combination is that it mitigates uncertainty in the model selection, model parameters and sampling uncertainty.

A very common combination operator is the mean. There is strong empirical evidence in the forecasting literature that the unweighted mean is as good as more complex combination approaches. A defining difference between forecast combinations and neural network ensembles is the number of models that can be combined. In the case of the latter, we can easily produce as many members as we want, typically either through bagging or varying the training initialisation values. Both of these approaches aim to mitigate the uncertainty coming from the available sample and model parameters. With a sufficiently high number of ensemble members we can estimate the distribution of forecasts that are to be combined, something that is typically not possible with forecast combination of a limited number of models. With that we can view the problem of forecast combination as an estimation of the location of the distribution. Following this line of thought, it becomes apparent that the mean is an appropriate combination operator only for well behaved distributions: unimodal and symmetric. In practice, this is a very strong requirement.

A recent paper looked at the effect of using three fundamental combination operators: mean, median and mode. They all aim to estimate the location of the distribution, but with a different level of robusness to deviations from normality. Median is less sensitive to outliers and more robust than mean to asymmetries in the forecast distribution, while the mode is insensitive to both. In theory one would expect the latter to produce better combined forecasts for this reason. This was shown empirically to be the case. From the same paper the following figure summarises the results for ensembles of 10 up to 100 members, for the three basic combination operators for two datasets.

ensperf

The poor performance of mean in comparison to that of median and mode is evident. Furthermore, with sufficient number of forecasts to be combined the mode performs better than the median, again as expected due to its robustness to non-normality. The mode has poor performance for small ensembles, as there is not enough sample to estimate it adequately. Another interesting question is how many ensemble members are required. We can see that both median and mode converge quite fast in both datasets, while the mean does not, even with 100 members. More details on these results can be found in the paper. The main conclusion is that we perhaps trust the mean as a combination operator too much, and we should consider medians or modes of forecast more, especially when there is a large number of forecasts available.

To get a better view of the behaviour of each ensemble operator, the following demo uses the neural network forecasts for one of the time series in the aforementioned paper. The complete series, which is not shown here, contains periods of upward and downward trends. You can experiment with different sizes of ensemble forecasts, up to 100 members. Apart from the combined forecasts and the out-of-sample mean absolute error, this demo tries to visualise the distribution of the forecasts for different horizons and how appropriate the mean, median and mode results are. Finally, the forecast of the “best” model according to an in-sample validation subset is provided. It is apparent that this is over-fitted in the past, capturing only the local downward trend that dominated in the validation set and demonstrates why ensembles are useful.

Note: if the demo does not start, please click on “Resample ensemble members” button to reload.

Choosing parameters for Croston’s method and its variants

Although Croston’s method and its variants are popular for intermittent demand time series, there have been limited advances in identifying how to select appropriate smoothing parameters and initial values. From the one hand this complicates forecasting for organisations, and from the other hand it does not permit automation. Recent research investigated various cost functions for optimising these methods and found two newly proposed ones, namely the MSR and MAR, for Mean Squared and Absolute Rate, to perform better than conventional squared or absolute error based cost functions. The argument for the new cost functions, in a nutshell, is that these methods produce a demand rate forecast, rather than a demand size forecast and therefore using error based cost functions is inappropriate.

The resulting parameters were found to be close to the ones suggested by the literature. The same paper looked at whether constraining the parameters helped and found minimal differences. Optimising the demand size and interval parameters separately was found to be beneficial, as well as optimising the initial values of demand and interval (or demand probability for the case of TSB). When all these findings were used in practice there were substantial improvements in inventory performance terms, as discussed in detail in the paper.

I have put together a simulator to illustrate how the various methods and cost functions work. All functions are from the tsintermittent package for R.

Update: If minimum and maximum aggregation levels are equal, an ADIDA forecast with automatic model selection (using iMAPA) is now produced correctly.

Update for the tsintermittent package

I uploaded today to CRAN an updated version of tsintermittent. Yesterday I found a bug in the optimisation of the Croston’s, SBA and TSB methods and this update fixes that. Here is the changelog for this update:

Version 1.3 (04 August 2014)
  * Corrected major optimisation bug - issue with opt.on in all crost, 
    tsb and sexsm functions.
  * Corrected bug in imapa function when maximumAL was very high.
  * Corrected bugs in the legend of imapa function plots.
  * Added option to optimise only initial values without parameters in 
    crost, tsb and sexsm functions.
  * Added init.opt option for imapa in the same way as it is for crost 
    and tsb functions. This does not affect SES optimisation.
  * Added option to use pre-fitted imapa models.

If you had experienced some strange behaviour with optimising the various methods, especially SBA, this update fixes this. I strongly suggest you update the intermittent demand package to version 1.3 as soon as possible!

Experimenting with Shiny for R

Shiny is a web application framework for R. The idea is simple: deploy R code in webpages. This might prove useful when user interaction is required, for instance to design and deploy forecasting experiments that need human participants.

I gave it a try to see how easy is it to build a demo. Assuming your R skills are ok, then using shiny should be straightforward. It took me a couple of hours starting from zero to get a somewhat complex example for forecasting using MAPA, for which there is already an R package.

Update 11/04/2014: I have revisited the code for the MAPA demonstration. The new one can fit within the normal blog page, but if you prefer the full screen version you can still find it here. You can also still find the two necessary files for the old version of this example, one for the server running the R code and one for the user interface.

What is the Forecasting Society?

Want to learn more about the Forecasting Society?

These videos will give you an idea what it is all about.

Are you a researcher in forecasting?

or are you interested in participating in forecasting research?

Either way, if you are interested in the area and in particular when human judgement is involved, the Forecasting Society will be an interesting webspace to visit.

MAPA and intermittent demand forecasting

Recently I posted about a paper I co-authored with Fotios Petropoulos, now in JORS: Forecast Combinations for Intermittent Demand. There we found that for intermittent demand data using multiple levels of temporal aggregation, forecasting them with the appropriate models and finally combining the forecasts performed best.

This approach has many analogies with the MAPA algorithm for non-intermittent time series. However, one difference between the original MAPA and its intermittent demand equivalent (let’s name it iMAPA for now) is that the former is fully automatic, while the latter was investigated with a-priori selected parameters for the intermittent demand methods (Croston and SBA). Therefore, although iMAPA outperformed the benchmarks, it was not fully automatic.

Recently, I published a paper on how to automatically select parameters for Croston’s method and its variants and put everything together in an R package to make it easily accessible to other researchers and practitioners. The apparent next step was to explore how automatic parameter specification worked with iMAPA and investigate whether this provided further improvements, as it was the case for Croston, SBA and TSB. In addition, following the original MAPA, I investigate the impact of the combination operator.

Using the same experimental setup as in the JORS paper the new results are as follows:

Scaled Mean Absolute Error
Method Parameter sMAE
Croston 0.10 1.770
SBA 0.10 1.724
iMAPA (mean) 0.10 1.696
* iMAPA (mean) Optimal 1.564
* iMAPA (median) Optimal 1.482

The first three rows are taken from Table 2 in the paper. The last two rows refer to the new results (also highlighted with an *) for iMAPA with optimal parameters and mean or median combination of the temporally aggregated levels. We can observe interesting improvements for intermittent demand data, where methods typically perform similarly. When I get the chance I will look into the inventory implications in more detail.

The updated tsintermittent package for R includes the new function ‘imapa’ that was used to produce these results. To use it simply type:

> imapa(ts.data2,outplot=1)

imapa.fig1

The function accepts different arguments for the minimum and maximum temporal aggregation level, the combination method, a-priori selected model parameters or optimised and various visualisations of the results, such as the following example that summarises what model was selected in each aggregation level, using the PK classification:

imapa.fig2

Perhaps it is useful to note that if you select the same minimum and maximum temporal aggregation levels then iMAPA becomes similar to the ADIDA, albeit with automatic model selection and parameterisation, but constrained to using only Croston’s method, SBA or SES.

Benchmark your forecasts

Over the years I have reviewed numerous papers that do not properly benchmark the various methods proposed. In my opinion if a paper has an empirical evaluation, then it must have appropriate benchmarks as well. Otherwise, one cannot claim that convincing empirical evidence is provided. The argument is simple: if the proposed method does not provide benefits over established benchmarks, what does it offer? Of course it is important to be careful not to focus just on forecasting accuracy, as there may be other benefits such as automation, robustness, computational efficiency, etc.

This is very relevant to practice as well. Often companies do not benchmark their forecasts and have little evidence if they their forecasts are fine, if they could improve, or how to improve. In my consultancy work this is often the first step that I take: benchmark the current process.

Of course, this is often easier said than done. One needs to have benchmarks implemented and these should not require far too much expertise to use. This is where all the work by various members of the community comes in handy! Currently there are multiple time series packages for R freely available. A lot of these are very easy to use.

Let us assume that we have a time series we want to forecast and that we have developed a new method. This is how easy it is to produce some benchmark forecasts:

> # Load the necessary libraries
> library(forecast)
> library(MAPA)
> library(tsintermittent)
> 
> # In-sample data: 'y.trn'
> # Out-of sample: 'y.tst'
> # Forecasts from our new brilliant method are stored in 'mymethod'
> 
> # Start timer - Just to see how long this takes
> tm <- proc.time()
> 
> # Let's produce some benchmarks
> frc <- array(NA,c(7,24)) # 7 benchmarks, forecast horizon 24
> 
> fit.ets <- ets(y.trn)
> fit.arima <- auto.arima(y.trn)
> fit.mapa <- mapaest(y.trn,paral=1,outplot=FALSE)
> frc[1,] <- rep(y.trn[n-24],24)
> frc[2,] <- forecast(fit.ets,h=24)$mean
> frc[3,] <- forecast(fit.arima,h=24)$mean
> frc[4,] <- mapafor(y.trn,fit.mapa,fh=24,ifh=0,outplot=FALSE)$outfor
> frc[5,] <- crost(y.trn,h=24)$frc.out
> frc[6,] <- crost(y.trn,h=24,type="sba")$frc.out
> frc[7,] <- tsb(y.trn,h=24)$frc.out
> rownames(frc) <- c("Naive","ETS","ARIMA","MAPA","Croston","SBA","TSB")
> 
> # Calculate accuracy
> PE <- (matrix(rep(y.tst,8),nrow=8,byrow=TRUE) - 
+ rbind(mymethod,frc))/matrix(rep(y.tst,8),nrow=8,byrow=TRUE)
> MAPE <- rowMeans(abs(PE))*100
> 
> # Stop timer
> tm <- proc.time() - tm

So what we have here is forecasts from ‘mymethod’ and some benchmarks:

  • Naive: hopefully our method predicts better than the random walk.
  • ETS: state space exponential smoothing from the ‘forecast’ package.
  • MAPA: multiple aggregation prediction algorithm using ets from the ‘MAPA’ package.
  • Croston’s method: this is supposed to be used to intermittent data, but I just wanted to demonstrate how easy is to use this as a benchmark. This is from the ‘tsintermittent’ package.
  • SBA: this a variant of Croston’s method from the ‘tsintermittent’ package.
  • TSB: another intermittent demand method from the ‘tsintermittent’ package.

Of course not all benchmarks are appropriate, but I wanted to demonstrate how easy is to use them. Accuracy is assessed using Mean Absolute Percentage Error (MAPE), for t+1 up to t+24 forecasts.

> print(round(MAPE,2))
mymethod    Naive      ETS    ARIMA     MAPA  Croston      SBA      TSB 
    5.76     7.37     6.00     7.34     5.59     5.76     7.07     7.26 
> print(tm)
   user  system elapsed 
   4.24    0.00    4.86 

Apparently mymethod is pretty good, but in terms of accuracy MAPA seems to be doing better. Perhaps it is interesting to see that Croston’s method is not that bad, which makes sense considering that for non-intermittent data it is equivalent to exponential smoothing.

The whole benchmarking took very little coding and only 4.86 seconds to run. This could be sped up using parallel processing that both ‘forecast’ and ‘MAPA’ packages support. Of course we would prefer to have used rolling origin evaluation (cross-validation), and this could be implemented easily with a loop.

Here is the series and the various forecasts:

> cmp = rainbow(8, start = 0/6, end = 4/6)
> ts.plot(y.trn,y.tst,ts(t(rbind(mymethod,frc)),frequency=12,end=end(y.tst)),
+ col=c("black","black",cmp))
> legend("bottomleft",c("MyMethod",rownames(frc)),col=cmp,lty=1,ncol=2)

bench.fig1

The example series is ‘referrals’ from the ‘MAPA’ package. In other posts here you can find more information about the functions in the MAPA and tsintermittent packages.

To conclude: benchmark your forecasts, it is easy and necessary!

Forecasting Society launched!

Together with Fotios Petropoulos we launched a new portal to facilitate judgemental forecasting research. The aim of the Forecasting Society (www.forsoc.net) is to bring together researchers in judgemental forecasting and participants from practice and academia. At the same we hope that it will grow to be a discussion forum to exchange and promote judgemental forecasting ideas and research.

Currently it hosts two two active experiments, one looking at how humans choose between different forecasting models in contrast to statistical methods and the second one investigating the link between the cognitive abilities of forecasters and forecasting capabilities.

We invite you to host your forecasting experiments at the Forecasting Society (www.forsoc.net) or join as a participant!

forsoc.1

Intermittent demand forecasting package for R

A new package for analysing and forecasting intermittent demand time series and slow moving items has been release for R. You can download the latest version from CRAN.

The launch version contains the following functions:

  • crost: Croston’s method and variants.
  • crost.ma: Moving average with Croston’s method decomposition.
  • idclass: Time series categorisation for intermittent demand.
  • simID: Simulator for intermittent demand series.
  • tsb: TSB (Teunter-Syntetos-Babai) method.

Two demo time series are included to try the various functions: data and data2.

To fit Croston’s to a time series, simply use the following command:

> crost(data)

The function comes with a few different options. By default it optimises model parameters and initial values using the MAR cost function. Using the flag cost different cost functions are available. By default the smoothing parameter for the non-zero demand and demand intervals are optimised separately, but this can be controlled using the flag nop. Similarly the initialisation values for demand and intervals can be optimised or not and set either manually, or to preset initialisation heuristics. These are controlled by the flags init.opt and init.

Three different variants of the method are implemented:

  •  ‘croston’, for the original method.
  • ‘sba’, for the Syntetos-Boylan approximation.
  • ‘sbj’, for the Shale-Boylan-Johnston correction.

For example, to get SBA forecasts with optimal parameters for 12 months ahead and plot the results you can use:

> crost(data,type='sba',h=12,outplot=TRUE)

This will provide the following visual output:
tsID.fig1
Functions tsb and crost.ma allow similar level of control.

The next interesting function in the package allows you to create simulated intermittent demand series. The simulator assumes that non-zero demand arrivals follow a bernoulli distribution and the non-zero demands a negative binomial distribution. For example to create 100 simulated time series, 60 observations each, use:

> dataset <- simID(100,60,idi=1.15,cv2=0.3)

The two last inputs control the average demand interval and the squared coefficient of variation of the non-zero demand for the series.

To get a better view of the generated series, or a real dataset, we can use the idclass function. This categorises the intermittent demand time series according to existing classification schemes. In particular the following are implemented:

  • ‘SBC’, classification proposed by Syntetos-Boylan-Croston.
  • ‘KH’, the revised classification by Kostenko-Hyndman using the exact separation.
  • ‘KHa’, as above but using the approximate separation.
  • ‘PK’, the further revised classification that deals with temporal aggregation by Petropoulos-Kourentzes.
  • ‘PKa’, as above but using the approximate separation.

Both ‘KH’ and ‘PK’ require the demand interval smoothing parameter of SBA as input. This can either by calculated automatically by the function, or given as an input using the a.in flag.

Finally there are two views for each classification. One summarised that merely provides the number of time series the class/method or a detailed one that provides a scatterplot of the time series characteristics. Here are some examples:

> idclass(t(dataset))

This gives by default the PKa classification.
tsID.fig2

> idclass(t(dataset),type='SBC')

tsID.fig3

> idclass(t(dataset),type='KHa')

tsID.fig4

> idclass(t(dataset),type='PK',outplot='detail')

tsID.fig5
Note that the PK and KH exact classifications, unless the smoothing parameters are provided, can be computationally expensive.

More information about the various optimisation cost functions for these intermittent demand methods and the classification schemes can be found here and here. An interactive demo of various functions and optimisation options can be found here.

On Intermittent Demand Model Optimisation and Selection

N. Kourentzes, 2014, International Journal of Production Economics, 156: 180-190. http://dx.doi.org/10.1016/j.ijpe.2014.06.007

Intermittent demand time series involve items that are requested infrequently, resulting in sporadic demand. Croston’s method and its variants have been proposed in the literature to address this forecasting problem. Recently other novel methods have appeared. Although the literature provides guidance on the suggested range for model parameters, a consistent and valid optimisation methodology is lacking. Growing evidence in the literature points against the use of conventional accuracy error metrics for model evaluation for intermittent demand time series. Consequently these may be inappropriate for parameter or model selection. This paper contributes to the discussion by evaluating a series of conventional time series error metrics, along with two novel ones for parameter optimisation for intermittent demand methods. The proposed metrics are found to not only perform best, but also provide consistent parameters with the literature, in contrast to conventional metrics. Furthermore, this work validates that employing different parameters for smoothing the non-zero demand and the inter-demand intervals of Croston’s method and its variants is beneficial. The evaluated error metrics are considered for automatic model selection for each time series. Although they are found to perform similarly to theory driven model selection schemes, they fail to outperform single models substantially. These findings are validated using both out-of-sample forecast evaluation and inventory simulations.

Download paper.
R code available. An interactive demo of the various optimisation cost functions can be found here.