Intermittent Demand Forecasts with Neural Networks

By | April 19, 2013

N. Kourentzes, 2013, International Journal of Production Economics, 143: 198-206.

Intermittent demand appears when demand events occur only sporadically. Typically such time series have few observations making intermittent demand forecasting challenging. Forecast errors can be costly in terms of unmet demand or obsolescent stock. Intermittent demand forecasting has been addressed using established forecasting methods, including simple moving averages, exponential smoothing and Croston’s method with its variants. This study proposes a neural network (NN) methodology to forecast intermittent time series. These NNs are used to provide dynamic demand rate forecasts, which do not assume constant demand rate in the future and can capture interactions between the non-zero demand and the inter-arrival rate of demand events, overcoming the limitations of Croston’s method. In order to mitigate the issue of limited fitting sample, which is common in intermittent demand, the proposed models use regularised training and median ensembles over multiple training initialisations to produce robust forecasts. The NNs are evaluated against established benchmarks using both forecasting accuracy and inventory metrics. The findings of forecasting and inventory metrics are conflicting. While NNs achieved poor forecasting accuracy and bias, all NN variants achieved higher service levels than the best performing Croston’s method variant, without requiring analogous increases in stock holding volume. Therefore, NNs are found to be effective for intermittent demand applications. This study provides further arguments and evidence against the use of conventional forecasting accuracy metrics to evaluate forecasting methods for intermittent demand, concluding that attention to inventory metrics is desirable.

Download paper.

5 thoughts on “Intermittent Demand Forecasts with Neural Networks

  1. Pratyaksa

    Dear Mr Nikolaos,

    I read your research paper. You propose NN-dual and NN-rate models for intermittent demand forecast. The NN uses lags from both the non-zero demand and inter-demand interval as input. I want to implement your models using Matlab or R to forecast my intermittent data. I am confuse to define the input of neural network.

    For example, i want to forecast small data set (ts.data2)
    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
    1980 0 2 0 1 0 11 0 0 0 0 2 0
    1981 6 3 0 0 0 0 0 7 0 0 0 0

    crost.decomp(ts.data2). The result :
    [1] 2 1 11 2 6 3 7
    [1] 2 2 2 5 2 1 6

    What are the inputs of NN? $demand series as “demand inputs” and $interval series as “interval inputs” of NN?
    or do i need to plot ACF-PACF to find significant lags from the non-zero demand series based on $demand and inter-demand intervals series based on $interval? if i find significant lags on ACF-PACF plot, the lags will be used as input of NN?
    I would appreciate if you could give me more details the two variables on the input neuron.
    Thank you,

    1. Nikos Post author

      Hi Pratyaksa,
      There are two questions. First, indeed I am using the $demand and $interval as inputs for the NN. Second, in terms of lags, there are many ways one could go about it, but given the limited data, you could also fix this to a reasonably small number, for example 2 or 3. Note that since ANNs are autoregressive in nature you will need to look at the PACF plot if you would go that direction. Regression modeling may also be useful.
      Finally for ANNs to work in this context you need to be very careful with their training, due to the limited sample size. I found that using regularisation was very helpful.

  2. Andreas

    Dear Mr Kourentzes,

    thanks for the very interesting and helpful blog.

    I have a question regarding the application and comparison of different forecasting methods for intermittent demand (both “classical” ones such as Croston as well as neural networks). My data set consists of ~8,000 SKUs, each with ~350 observations (time series). Let’s assume I want to apply Croston and ANN to this data and compare the forecast accuracy.

    For ANN, I split each time series into a training and a testing part (let’s assume a split of 65:35 – 228 observations are used for training, 122 observations for testing). I now see different approaches to make the ANN results comparable to Croston results:
    a) Multistep forecasting: Use observations t1-t228 to forecast observations t229-t350. Resulting in 122 forecasts which are all the same for Croston but differ for ANN (one Croston calculation, one neural network).
    b) Rolling forecasting: Use observations t1-t228 to forecast observation t+229, then use data from t1-t229 to forecast t230 etc. (resulting in 122 Croston calculations as well as 122 separate neural networks).
    c) Combining a)+b): Use data t1-t228 to forecast t229-t233 (1,2,3,4 periods ahead), then use t1-t233 to forecast t234-238 and so on… (resulting in 31 Croston calculations as well as 31 neural networks).

    What approach would you go for and what did you use for your paper?

    Thanks for your help!

    1. Nikos Post author

      Hi Andreas,

      Thank you.
      With regards to your question, I think the starting point to answering it is what would be a reasonable forecast horizon for your dataset – if you have access to the source, you can ask them! About (a) I am somewhat skeptical to run an experiment with such a long forecast horizon. As forecasts are iterative (i.e. based on the previous forecasted value) eventually you will accumulate very substantial errors – one of the main reason why long term forecasting is so difficult for high-frequency data. I quite like (b) mainly because rolling origin allows you to collect multiple error measurements, therefore you increase confidence in your results, and in case of outliers or other peculiar values, these will not affect all forecast due to the rolling nature of the evaluation. (c) is closer to what one should do in my view, but with a slight difference. There is no reason to not have the evaluation windows non-overlapping. So I would forecast using t1-t228 for period t229-t229+h (where h is the forecast horizon), then forecast using t1-t229 for periods t-230-t230+h and so on. Again, what is a reasonable h? Your dataset context should help you identify that!

      Hope this helps, Nikos


Leave a Reply to Andreas Cancel reply

Your email address will not be published. Required fields are marked *