Intermittent demand forecasting package for R

By | June 23, 2014

A new package for analysing and forecasting intermittent demand time series and slow moving items has been release for R. You can download the latest version from CRAN.

The launch version contains the following functions:

  • crost: Croston’s method and variants.
  • crost.ma: Moving average with Croston’s method decomposition.
  • idclass: Time series categorisation for intermittent demand.
  • simID: Simulator for intermittent demand series.
  • tsb: TSB (Teunter-Syntetos-Babai) method.

Two demo time series are included to try the various functions: data and data2.

To fit Croston’s to a time series, simply use the following command:

> crost(data)

The function comes with a few different options. By default it optimises model parameters and initial values using the MAR cost function. Using the flag cost different cost functions are available. By default the smoothing parameter for the non-zero demand and demand intervals are optimised separately, but this can be controlled using the flag nop. Similarly the initialisation values for demand and intervals can be optimised or not and set either manually, or to preset initialisation heuristics. These are controlled by the flags init.opt and init.

Three different variants of the method are implemented:

  •  ‘croston’, for the original method.
  • ‘sba’, for the Syntetos-Boylan approximation.
  • ‘sbj’, for the Shale-Boylan-Johnston correction.

For example, to get SBA forecasts with optimal parameters for 12 months ahead and plot the results you can use:

> crost(data,type='sba',h=12,outplot=TRUE)

This will provide the following visual output:
tsID.fig1
Functions tsb and crost.ma allow similar level of control.

The next interesting function in the package allows you to create simulated intermittent demand series. The simulator assumes that non-zero demand arrivals follow a bernoulli distribution and the non-zero demands a negative binomial distribution. For example to create 100 simulated time series, 60 observations each, use:

> dataset <- simID(100,60,idi=1.15,cv2=0.3)

The two last inputs control the average demand interval and the squared coefficient of variation of the non-zero demand for the series.

To get a better view of the generated series, or a real dataset, we can use the idclass function. This categorises the intermittent demand time series according to existing classification schemes. In particular the following are implemented:

  • ‘SBC’, classification proposed by Syntetos-Boylan-Croston.
  • ‘KH’, the revised classification by Kostenko-Hyndman using the exact separation.
  • ‘KHa’, as above but using the approximate separation.
  • ‘PK’, the further revised classification that deals with temporal aggregation by Petropoulos-Kourentzes.
  • ‘PKa’, as above but using the approximate separation.

Both ‘KH’ and ‘PK’ require the demand interval smoothing parameter of SBA as input. This can either by calculated automatically by the function, or given as an input using the a.in flag.

Finally there are two views for each classification. One summarised that merely provides the number of time series the class/method or a detailed one that provides a scatterplot of the time series characteristics. Here are some examples:

> idclass(t(dataset))

This gives by default the PKa classification.
tsID.fig2

> idclass(t(dataset),type='SBC')

tsID.fig3

> idclass(t(dataset),type='KHa')

tsID.fig4

> idclass(t(dataset),type='PK',outplot='detail')

tsID.fig5
Note that the PK and KH exact classifications, unless the smoothing parameters are provided, can be computationally expensive.

More information about the various optimisation cost functions for these intermittent demand methods and the classification schemes can be found here and here. An interactive demo of various functions and optimisation options can be found here.

106 thoughts on “Intermittent demand forecasting package for R

  1. Fikri

    Dear Nikolaos,

    may I ask, the interpretation from the output of function crost? I read that for intermittent data, Croston and Syntetos-Boylan is the method best used. But, the output from croston method does not seem to generate interdemand period, and somehow, the value forecasted felt underforecast.
    Also, is it possible to see the plot combined with out-of-sample data and measure the forecasting methods performance with that data?

    Reply
    1. Nikos Post author

      Croston’s method and its variants are separating an intermittent demand series into two components, one for the non-zero demand and one for the inter-demand interval. These two are smoothed separately, using exponential smoothing and their forecasted value is then divided to provide a `demand rate` forecast. So let’s assume that the forecasted non-zero demand is 4 and the forecasted inter-demand interval is 8. The croston forecast will be 4/8 = 0.5. Therefore, instead of producing a demand and interval forecast, you get a demand rate that combines these two. The way I read this forecast is that there will be a need for 0.5 extra units each period. Eventually over the 8 periods of the forecasted interval a total of 4 units will be in demand. The true timing of the demand event within the predicted interval is unknown and hard to predict, therefore with this method we distribute the demand over the expected interval periods, hoping that when this is translated into an inventory decision enough stock will be at hand.

      To make a long story short, the forecast of Croston’s method and its variants is a demand rate forecast and not a demand forecast. This is the reason why typically we do not consider the interval forecast on its own. What you get from the crost function is the demand rate forecast. This also brings me to the second part of your comment. Measuring forecasting accuracy for Croston’s and its variants is not straightforward, as the number we get as a forecast is not a demand and therefore taking the difference between the actual demand and a croston forecast is not meaningful. I do a much better job at explaining this in this paper if you are interested.

      Reply
  2. Anurag

    Hi Nikolaos,
    Can you please clarify or confirm if my understanding is correct?
    I take data for 2 years for forecasting and the demands are intermittent. i am trying to predict for next 6 months.
    Lets say i get a demand forecast of 4 units for each month for a period of 6 months.
    So instead of interpreting the result as 24 units over the period of 6 months, i should interpret it as total of 4 units over the period of 6 months will be required. or 4/6=0.67 units each month?

    Reply
    1. Nikos Post author

      Hi Anurag,

      If I understand your description correctly you should not divide it, if your per month forecast is 4 units.
      Let’s say you forecast a time series with Croston’s method and you get a per month forecast of 0.67. That would be the expected rate of demand per month (assuming monthly intermittent data). So you will observe a full unit of demand within 2 months (0.67*2), but with unknown timing. Within 6 months (lets assume that is your horizon), assuming this is your planning horizon, you would have a demand of 4 units, but of unknown timing again. These units could be in the very first month, or the last one or any in-between. Also Croston’s method is unable to actually tell you whether this demand will be clumped or not, so you may have any demand over that period that sums up to 4, but could happen in just one month, or distributed within the 6 months.

      Reply
  3. Anurag

    Hey Nikolaos,
    Thank you for the prompt.
    I get this now. But how do i measure the accuracy. I went through your paper and only able to understand that MAE and MAPE are not good measures to calculate the accuracy of the model. Any help on this please.

    Many thanks.

    Reply
    1. Nikos Post author

      Hi Anurag,

      I realised that my reply for some reason got lost, so let me repost it. I personally prefer to use the impact on the decision variable to evaluate intermittent demand methods, therefore I prefer to directly measure the impact on inventory in my context. Nonetheless such an approach is very cumbersome to implement and computationally expensive. P. Wallström and A. Segerstedt (2010) have suggested using cumulative errors, such as the PIS which I refer to in my paper. Cumulative errors have the advantage that they overcome the “demand rate problem” of Croston’s and its derivatives, i.e. that the time series is measuring demand, while the forecast is a demand rate, as I discuss in my paper. Another interesting approach was used by R. D. Snyder et al. (2012) who used metrics that look at the prediction distributions. In my opinion this still an open question.

      Reply
  4. Priyadarshini Mukherjee

    Hi Nikolaos,
    I have intermittent demand for all my timeseries which is 36 months of history data say for 100 spares. I have used something like this for a few:
    crost(abc,h=12,type=c(“sba”),cost=c(“msr”),outplot=c(TRUE))
    My horizon is 12 months. Now , $frc.out is 377.4191. According to your explanation above, since crost gives the demand rate, so we can say in next 12 months we will have 377 units of demand, but cannot comment on the monthly demand rate or periods of no demand. Is there a way to generate a monthly number as well. Your help will be highly appreciated.
    Thanks
    Priya

    Reply
    1. Nikos Post author

      Hi Priya,

      The output of $frc.out should be 12 values (as you have h=12). Each of these values refers to the demand rate for each month, starting from t+1 up to t+12.
      If you would calculate the sum of $frc that would the expected total demand over the complete 12 months.

      Reply
  5. Priyadarshini Mukherjee

    Thank you for your prompt reply….really helpful

    Reply
  6. Priyadarshini Mukherjee

    Hi Nikolaos,
    I have intermittent demand for all my timeseries which is 36 months of history data say for 100 spares. I tried to use
    Simulator of Intermittent Demand Series and used
    dataset<-simID(n=1, obs=12, idi=1, cv2=1.085765157, level=NULL).
    I calculated cv2 for one series by removing the zeroes, but i am not sure how to calculate idi or Average intermittent demand interval of each series.
    If you can guide me, will be really grateful.

    Priya

    Reply
    1. Nikos Post author

      Hi Priya,

      The command you are running constructs 1 time series with 12 observations, but no intermittency. The average intermittent demand interval (idi) is set to 1 in your example, so the time series will have demand at every interval! You would need idi > 1 to have any intermittency. For the details of either cv2 or idi you could have a look at the paper by Syntetos et al. (2005). Briefly, to calculate this you first need to find the number of observations between all non-zero demand occurrences. Then you calculate the average over these. Say you have this time series: 1, 0, 0, 4, 0, 1, 2. Then the demand intervals are 3, 2, 1 and the average interval is simply their average.

      Reply
      1. Matt

        When you plot the output of the simID function in any ts.plot (time series plot) the results are reversed. So if I call testSet <- simID(n=5,obs=10,id=1.75,cv2=0.3) and I ts.plot(testSet), the x axis has 5 data points (observations) and there are 10 lines (time series). Isn't this the opposite of what we should see, as I am calling 5 time series with 10 observations?

        Reply
        1. Nikos Post author

          Hi Matt,

          You are correct, the command should be ts.plot(t(testSet)). This will be corrected in a latter version. Other functions in the package (e.g. idclass) assume each column to be a series, rather than row.

          Thanks,
          Nikos

          Reply
      2. Dr. Michael Merdian

        Dear Niko,
        I have a similar problem. When looking at the decomp I find an additional 1 (interval) at the beginning.
        This seems to be standard in the tsintermittent.
        Can you help ?
        thanks
        Michael

        Reply
        1. Nikos Post author

          Hi Michael,

          If I run crost.decomp(ts.data1) and I look at the length of the resulting intervals and demand they are the same length.
          lapply(crost.decomp(ts.data2),length) gives me 8 for both. Don’t forget that there is an issue with the initialisation of the decomposition as the first interval is always unknown.

          Nikos

          Reply
  7. KANNAPPAN

    Dear Professor,

    I have one 5 year data Jan 2011 to Jan 2015, how can i forecast for Jan 2016? less available data points in R

    THANK YOU

    Reply
  8. KANNAPPAN

    sorry, Only 5 data points from Jan 2011 to Jan2015, i want to forecast Jan2016? Could you suggest me in R with code and method?

    thank you

    Reply
    1. Nikos Post author

      Hi Kannappan,

      I would suggest you use the imapa function from the tsintermittent package. To my understanding this hould be on of the safest methods to use, unless you start tweaking the settings. However, if you want to control the parameters/initial values crost gives you more control.

      Hope this helps!
      Nikos

      Reply
  9. Karan

    I have a time series data. How do i calculate my adi and cov2 ? Is there a package in R for the same or do i have to do it manually?

    Reply
    1. Nikos Post author

      The idclass function works for individual series as well. Alternatively you can use crost.decomp to get the demand size and interval series from your time series and calculate the coefficient of variation and mean interval manually.

      Reply
      1. Hans

        Dear Mr. Nikos

        Thanks for sharing information Sir.
        I called ts.data2. the data are shown :
        Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
        1980 0 2 0 1 0 11 0 0 0 0 2 0
        1981 6 3 0 0 0 0 0 7 0 0 0 0

        After that i use crost.decomp(ts.data2). The result :
        $demand
        [1] 2 1 11 2 6 3 7
        $interval
        [1] 2 2 2 5 2 1 6

        I rewrite ts.data2 then insert data to new variable (testdata) as follow
        testdata2<-c(0,2,0,1,0,11,0,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0)
        After that, I use idclass(testdata2) to know CV2 and ADI . The results show for
        $cv2
        [1] 0.6197917
        $p
        [1] 2.857143

        The question is how to calculate CV2 and ADI manually? what is the formula to calculate them?

        Thank you

        Reply
        1. Nikos Post author

          Hi Hans,

          Try this:
          x <- crost.decomp(testdata2)
          cv2 <- (sd(x$demand)/mean(x$demand))^2
          p <- mean(x$interval)

          You can find more details in this paper.

          Reply
  10. Bailey

    Hi Nikolaos,

    I’m using tsintermittent to produce forecasts to compare to the current state (black box) forecasts at my company. We think that our forecasts for intermittent demand are based on Croston’s. The forecasts for these products are provided as two numbers (the non-zero demand size and the interval between them). I found your paper on this very helpful, and I plan to use MAR and/or MSR to compare forecasts. However, it would be very helpful to see both parts of the forecast rather than the demand rate, since this is what our management is used to. Is there a way to output the two optimized forecasts or perhaps to reverse engineer them from the output?

    Thank you
    Bailey

    Reply
    1. Nikos Post author

      Hi Bailey,
      Let me add these two outputs to the code. It should not be too complicated. However, unless you set the smoothing parameters to something like the naive (alpha = 1) then the initialisation of the method may make the comparison problematic. I will let you know when this is updated on CRAN.
      I would be interested to know which company if you can share this information (here or by email).
      Thanks,
      Nikos

      Reply
  11. Bailey

    Hi Nikos,

    Thank you so much for that! I am kind of curious what you mean by the comparison being problematic though.

    Bailey

    Reply
    1. Nikos Post author

      Hi Bailey,
      The updated package is now available on CRAN. Croston now gives an additional output components that provides the non-zero demand and intervals for the in-sample and out-of-sample periods, as well as the calculated coefficient used for SBA or SBJ corrections, if these are used. Hope this helps!
      Nikos

      Reply
  12. Vinod

    The output is linear forecast. Is there any way to convert it into intermittent demand forecast?

    The output is
    $components$c.out
    Demand Interval
    [1,] 0.992811 2.491074
    [2,] 0.992811 2.491074
    [3,] 0.992811 2.491074
    [4,] 0.992811 2.491074
    [5,] 0.992811 2.491074
    [6,] 0.992811 2.491074

    Reply
    1. Nikos Post author

      I am not sure what you mean by intermittent demand forecast. Croston’s method and its variants by construction will provide a linear forecast.
      Can you please elaborate?

      The function provides the output as non-zero demand and inter-demand interval components, but also as a final forecast $frc.out.

      Nikos

      Reply
      1. Vinod

        Thanks Nikolaos for fast reply,

        I am trying to forecast the demand which have intermittent data.

        If the actual demand is intermittent, then the forecast should be intermittent right.

        But croston method is giving linear forecast.
        Sample Code:-
        x <- rpois(20,lambda=.3)
        crost(x, type='sba', h=6, outplot=TRUE)
        Output:-
        $frc.out
        [1] 0.4210526 0.4210526 0.4210526 0.4210526 0.4210526 0.4210526
        Is it possible to get the output in intermittent form. (having zeros and integers)

        Thanks,
        Vinod

        Reply
        1. Nikos Post author

          The logic behind this method is different, so Croston’s method and its variants are unable to provide an intermittent demand as a forecast. Intermittent demand has uncertainty both in the demand size and in the demand timing and the way around trying to model this double uncertainty is instead of predicting demand per period (which should be intermittent), to predict a `demand rate’ that makes more sense as a cumulative demand over a number of periods (the lead time + review period). These methods would be unable to provide such a forecast. I am aware of a particular software that does provide a so-called `sporadic’ forecast from Croston’s method, but the calculation there is misleading. You may find useful to read this paper for more details on why the forecasts that these methods provide is useful.

          Nikos

          Reply
  13. Michael Stewart

    What type of format is the input data for crost()?

    I don’t understand the input specification. Where should I look to find that?

    Thank you,
    Michael

    Reply
    1. Nikos Post author

      Hi Michael,

      It can be a numerical vector or a ts object. For example: y<-c(2,0,0,1,0,3)
      or as a monthly time series y.ts <- ts(y,frequency=12)

      I can then write either crost(y) or crost(y.ts)

      Reply
  14. Petros

    Hello Niko,

    Could you please clarify the difference of the outputs $frc.out and $components$c.out ?

    Thank you,
    Petros

    Reply
    1. Nikos Post author

      Hi Petro,

      $frc.out provides the final forecasts, while the $components$c.out contains the predicted non-zero demand (first column) and the predicted inter-demand intervals (second column). If you are using standard croston then dividing these two will give you the values you get in $frc.out. If you are using any adjusted version then you will need to multiply the division result by the coefficient in $components$coeff. The $components output is aimed more for diagnostic and research purposes and for normal use the $frc.out should do the job.

      Best,
      Nikos

      Reply
  15. Dnyaneshwar

    Dear Nikolaos sir,
    I read the your research paper . I gone through your comments on above post. Thanks for sharing information.

    I want to forecast for a intermittent part with having demand hit 5-8 in a year with very low demand size ( demand qty=1,2,3 etc.).
    I use the Croston variant method to do forecact for such kind of parts but I am not satisfied with result. Can you suggest a good method from your R package- “tsintermittent”.
    and if possible explain me. I am new to R programming.

    example: part1- demand for (1, 0,0,0,1,1,2,0,0,0,0,1,2,0,0,0,0,0,1,1,1,1,1) like this I want o forecast for next 3 month .

    Please help me

    Thanks
    Dnyaneshwar

    Reply
    1. Nikos Post author

      Hi Dnyaneshwar,

      I think one of the more robust methods to use for forecasting intermittent time series is imapa from the tsintermittent package. However an important question here is what is the criteria of a good forecast in your case? Good performance in intermittent demand context is quite hard to define and simply calculating typical error metrics, such as MSE or MAE (MAD) is misleading.

      Nikos

      Reply
  16. MG

    Hi Nikos,

    I have a data from 1st Jan 2014 to 1st August 2017. It is an intermittent demand case.

    data is something like this
    2014-01-01 2014-01-01 2014-01-01 2014-01-02 2014-01-02 2014-01-03 2014-01-03 2014-01-03
    0 0 0 0 0 0 0 0
    2014-01-03 2014-01-04
    0 0

    etc. there are values in later period.

    I want to forecast for next 30 days.
    I ran this code:
    x 0.1917784
    so, is it means that for next 30 days I must have a stock of 30* 0.1917784 ???

    Also, what other things i can interpret from this model???

    Also, what is next step to validate this model , means procedure to calculate MASE (mean absolute scaled error)?

    Thanks in advance

    Reply
    1. Nikos Post author

      Hi MG,

      The 30*0.1917784 would be the expected cumulative demand over the next 30 days, but without accounting for uncertainty. Normally you need to introduce some safety stock to account for this uncertainty. This depends on the inventory policy you want to use and so on. I would recommmend you have a look at my intermittent demand papers for a discussion on this. I think the most detailed is given on this one: http://kourentzes.com/forecasting/2013/04/19/intermittent-demand-forecasts-with-neural-networks/

      As for the error metric to use, well in the same paper I argue that error metrics, especially these type of error metrics are not great for validating intermittent demand forecasts, and only an inventory simulation provides a conclusive evaluation. The problem for instance of MASE is that it prefers 0 forecasts, i.e. a forecast being zero all the time would typically be preferred by the metric, even if such a forecast is useless. The reason for this is that due to the absolute in the error metric it looks for the median of your actuals and not the mean as we would normally anticipate. The median of intermittent demand time series is often 0.

      Nikos

      Reply
  17. Lissa

    Hi Nikolaos,

    I was wondering if TSB or SBA method can also help in the cases where demand sizes are not small? like they are medium sized (above 100 items per month)

    Thank you!
    Lissa

    Reply
    1. Nikos Post author

      Hi Lissa,

      If the demand becomes continuous then Croston’s method becomes the same as Single Exponential Smoothing. The same is true for TSB. SBA that adjusts Croston’s method forecasts would provide biased forecasts, so it would be inappropriate. If the demand is still intermittent, but of high volumes, then the classification scheme used to distinguish between Croston’s and SBA would tend towards SBA all the time, but we do not have a good way to choose between SBA and TSB. I do not see any particular reason that TSB is inappropriate apart from that it is looking for obsolescence.

      Reply
      1. Lissa

        Thank you for your reply, Nikos. I am not sure if I quite understand obsolescence. What does it mean in the intermittent demand context?

        Reply
        1. Nikos Post author

          Think about an item that you expect it to go out of market. For example, spare parts for an old car that is no longer sold, or towards end of product lifecycle, but still supported by a company in terms of service. The demand for spare parts will also gradually decrease. These gradual decrease in demand, to the point that demand will be zero, is what we try to model with TSB. If this obsolescence is not modelled, then for items that are very intermittent you will end up with excess stock when the item is indeed no longer sold.

          Reply
  18. Michael

    Hi!

    Love the package for my intermittent demand so far. I got a question on the data.frc function with method as “auto”.

    I keep running into the below error, not sure if there is any way to get around this?
    Error in vector(“list”, sum(cls$summary)) : object ‘cls’ not found

    Reply
    1. Nikos Post author

      There was a bug that had slipped through. I have fixed it and I will push an update to CRAN for this shortly.
      Thanks!
      Nikos

      Reply
      1. Sarah

        Hi Nikos,

        I am facing the same problem, and I was wondering whether the updated version has already been made public.

        Thank you for this amazing package!

        Reply
        1. Nikos Post author

          Hi Sarah,
          I have setup a development repository for the package: https://github.com/trnnick/tsintermittent
          You can find the latest version there. In the examples I am running the functions seem to work fine. If you want, please send me a example dataset that does not work to see how I can fix it.
          Thanks!

          Reply
          1. Irene

            Hi Nikos,

            thank you for the package, very useful for spare parts forecasting. Unfortunately, data.frc still fires “Error in vector(“list”, sum(cls$summary)) : object ‘cls’ not found”
            The data:
            date quantity
            200601
            200602
            200603
            200604
            200605
            200606
            200607
            200608
            200609
            200610
            200611
            200612
            200701
            200702
            200703
            200704
            200705
            200706
            200707
            200708
            200709 1
            200710
            200711
            200712
            200801
            200802
            200803
            200804
            200805
            200806
            200807
            200808
            200809
            200810
            200811
            200812
            200901 1
            200902
            200903
            200904
            200905
            200906
            200907 1
            200908
            200909 2
            200910
            200911
            200912
            201001
            201002
            201003
            201004
            201005
            201006
            201007
            201008
            201009
            201010 2
            201011
            201012
            201101
            201102
            201103
            201104
            201105
            201106
            201107
            201108
            201109
            201110
            201111
            201112
            201201
            201202
            201203
            201204
            201205 4
            201206
            201207
            201208
            201209
            201210
            201211 10
            201212
            201301
            201302
            201303
            201304
            201305
            201306 1
            201307 4
            201308
            201309
            201310
            201311 4
            201312
            201401 1
            201402
            201403
            201404
            201405
            201406
            201407 4
            201408
            201409
            201410
            201411
            201412

          2. Nikos Post author

            Hi Irene,
            I cannot reproduce the error in my case. Do you want to email me some data & code?
            Thanks,
            Nikos

  19. Johannes

    Hi Nikos,

    first of all, I want to thank you for sharing this wonderful package.

    I try to compare the performance of different sales forecast methods with each other. That`s why I would like to receive actual point forecasts from the croston`s method. I understand why a demand rate is more appropriate for forecasting intermittent time series, nevertheless I would like to receive point forecasts for comparison.

    In an earlier post you refered to particular software, that provides ‘sporadic’ forecasts. (21th April, 2016)
    Even when the calculations in this software is misleading, I would like to ask if it is possible to use this software in R?

    Johannes

    Reply
    1. Nikos Post author

      Hi Johannes,

      Thanks! It is possible, but I advise against it. You can find the demand size and demand interval forecasts separately at the output $components. There you will find c.out that has a column for the demand size and one for the interval. You will also find $coeff that would be the scaling factor for SBA. So if you want to simulate the output of SAP (which again I stress that it is wrong!) you can use the following set of commands:

      fit <- crost(ts.data1) horizon <- 20 forecast <- rep(0,horizon) forecast[seq(TRUE,horizon,round(fit$components$c.out[1,2]))] <- fit$components$c.out[1,1] print(forecast) Consider that if you want to produce the forecasts in this way there is no guideline if you should start from the first period or not (for instance, in the example above the forecasted demand is put at the 1st and 11th period, but could very well be at the 10th and 20th) and whether we should round up or down the interval. Hope this helps!

      Reply
  20. Lalit

    Nikos

    I want to compute Average Demand Interval for a set of demand databse. Can you please let me know how this will work in this package?
    Sample Data is as under
    Date Demand
    1 3
    2 0
    3 0
    4 1
    5 0
    6 1

    Reply
    1. Nikos Post author

      Hi Lalit,
      You would need to use the crost.decomp function. This will give you two vectors: (i) demand; and (ii) interval.
      All you need then is to calculate the mean of the interval.

      Reply
  21. Prerna Mishra

    Hi Nikos,

    i have a dataset of 1510 products having intermittent demand over 24 time periods. I am using croston methond to forecast demand. I am using the crost() method from tsintermittent package. below is my code:
    z=matrix(Intermittent_Dataset_Cleaned)
    mycroston <-function (x)
    {
    y=crost(x,h=3,init="mean",type="croston",w=NULL)
    return(y)
    }
    for(i in 1:1510)
    {
    forecast[i]=mycroston(z[i])
    }

    The above code obviosuly won't work. Could you help me figure out a way, where I could just return the $frc.in on calling the function and store it in forecast matrix.

    I am new to R, any help would be greatly appreciated.

    Thanks,
    Prerna

    Reply
    1. Nikos Post author

      Hi Prerna,

      The easiest way is to use the function data.frc. For this to work you need to have your data as a data.frame, where each column is a time series to be modelled and each row is a time period.
      The the call you need is:
      forecasts <- data.frc(Y,method="crost",h=3)
      Then a matrix with all the forecasts is available: forecasts$frc.out
      With this function you can call other forecasting methods that exist in tsintermittent as well. Have a look at the documentation.

      In the function above to make it work all you need to do is change the croston line to:
      y=crost(x,h=3,init="mean",type="croston",w=NULL)$frc.out

      Reply
      1. Prerna Mishra

        Hi,

        Thank you for you response. i tried running my function. I keep getting the error:
        Error in crost(x, h = 3, init = “mean”, type = “croston”, w = NULL) : (list) object cannot be coerced to type ‘double’.

        I also tried
        forecasts=data.frc(z,method=”crost”,h=3) and the error that I get is:
        Error in FUN(newX[, i], …) : (list) object cannot be coerced to type ‘double’

        Could you please point me to what I am doing wrong here?

        Thanks again,
        Prerna

        Reply
        1. Nikos Post author

          Can you describe x? What is its class (use class(x)) and its size?

          Reply
          1. Prerna Mishra

            Hi,

            I am taking a different approach now. However, I am running into a different problem:

            z=data.frame(Regular0_1Cases)
            forecasts0_1=data.frc(z,method=”crost”,type=”croston”,h=3)$frc.in
            forecasts0_1$frc.in

            forecasts0_1$frc.in is constantly showing to be null.

            I do get values for $frc.out.

            Could you please tell me why $frc.in is showing null?

            Thanks,
            Prerna

          2. Nikos Post author

            Hi Prerna,

            This really depends on the in-sample data. How many observations are there, when the first non-zero demand occurs and what settings you are using.
            From what I understand it seems that the in-sample is probably too short?

            Nikos

          3. Prerna Mishra

            There are 24 observations each series in the dataset. First non zero can be any where in the 24 periods.

            Thanks,
            Prerna

          4. Nikos Post author

            I cannot think of a reason that you do not get in-sample fit with the information I have. Can you be more specific?

  22. Luuk

    Hi, I am new to those methods and try to implement the SBA method to spare parts demand forecasting. Is it also possible to perform the analysis in Excel?

    Reply
    1. Nikos Post author

      Yes, it is possible to do this in excel, but it is not easy to automate in excel due to the decomposition required for Croston’s method. AN example of how to do this in excel can be found in this spreadsheet. This will help you produce a Croston forecast. To adjust this to become into SBA you will need to multiply the forecast by 1-(a/2), where a is the smoothing parameter for the intervals time series (the denominator in Croston’s method). Keep in mind that in excel getting optimal values for the initial values and the smoothing parameters can be quite messy.
      Nikos

      Reply
  23. Luuk

    Hi Nikos,

    Thank you very much for the spreadsheet attached. I had a look at it and it looks quite straightforward. As I see though, the method consistently underestimates the forecast compared to the actual values.

    Why do you think it is messy to find optimal values for the parameters and the initial values? Initial values are known, right? And the solver should make it possible to calculate the optimal smoothing parameter I think? Maybe you can explain me where you are refering to?

    I do have two questions left:
    – Can I also use a weekly period instead of a monthly period?
    – Is it possible to use this data in order to set optimal inventory levels given a certain service level? Or can I find a paper on how to do this?

    Thank you very much in advance!

    Reply
  24. Nikos Post author

    The underestimation is up to a point due to the initialisation and parameters of the method and also potentially due to the way that the forecast is assessed. This paper may be helpful, it discusses all three aspects and also explains what I meant by “messy” to find optimal values.

    As to your other questions: i) yes, you could use any frequency, the method is still applicable; ii) this really depends on the inventory policy you are using. On the top of my head I cannot think of a paper that is transparent enough on this (though the aforementioned paper discusses some of these aspects). Perhaps a textbook on operations and inventory (the new edition of Silver’s book is out) would be the best starting point.

    Reply
  25. Jayash

    Hi Nikolaos,

    Great post. I am using your package in R to generate the demand forecast. I was wondering if the package handles obsolescence of product or not? If not is there a way to extend the package?
    I am referring to the following paper: https://arxiv.org/abs/1409.1609

    Thanks,
    Jayash

    Reply
    1. Nikos Post author

      Hi Jayash,

      Thank you for the paper – unfortunately I have not implemented that specific method in the package. Currently, only TSB is available to deal with obsolescence of products. My intention is to do a major update of the package structure first, before i get the time to introduce new functionalities. I hope the time to do that soon!

      Best,
      Nikos

      Reply
      1. Jayash

        Hi Nikos,

        Thanks for the response, much appreciated. TSB works fine for my use-case. I look forward to the updates. 🙂

        I was wondering if there is a way to generate sample paths (through simID or similar simulation function) and update the mean, idi, CV2 along the sample path. This will introduce auto-correlation in the sample paths. Or will that be a violation of the assumptions of intermittent demand.

        Thanks again.
        Jayash

        Reply
        1. Nikos Post author

          Hi Jayash,

          Currently simID does not offer this. As for the “assumptions of intermittent demand”, I am of the opinion that our intermittent demand methods are so messy and incomplete (or our data generating models) that violating standard practice assumptions is fine, given that the resulting model/method performs better on real data after being rigorously evaluated. My paper on predicting intermittent demand with neural networks follows exactly this logic, I drop the assumption of independence between demand size and intervals that is typical in Croston’s method and its variants (including TSB), as well as assumed data generation processes and demonstrate gains on real data. Of course that paper has its limitations, but I think it makes a valid point in moving beyond the standard approaches in modelling intermittent demand data.

          Best,
          Nikos

          Reply
          1. Jayash

            Hi Nikos,

            Thanks for the explanation. I had read your paper and thanks for bringing it to my attention again in this context. I will try to implement a NN for my use case and see how it performs.

            Thanks again,
            Jayash

  26. Valery

    Hi Nikolaos,

    Great package! Many thanks for sharing.
    I was wondering if TSB or Croston method can support the idea of so-called ‘black holes’ inside of an input time series. In my case demand is generated from the website that simply doesn’t show a product that is out of stock for some period of time, so even there was some demand in real world it can’t be observed. I guess I am doing something incorrect setting zero demand for particular points.

    Best
    Valery

    Reply
    1. Nikos Post author

      Hi Valery,

      If I understand correctly your problem currently none of the methods in the package would do the trick. I think setting theae periods to zero is not appropriate as it would make the methods think tbat the interdemand periods are longer. So adapted method should be devised for this problem. Could you share an example perhaps? Thanks!

      Reply
      1. Valery

        Hi Nikolaos,

        Thanks for your prompt reply!
        Yes, this issue with the fake interdemand periods is exactly what I am concerning about. The typical example is below. [Nikos: data cropped]

        Reply
        1. Nikos Post author

          Interesting! I think I know how this can be dealt with, but I will need some time to code a solution. But I do not have a good view how well it will work!
          Do you have several time series that could be used to test a potential solution?
          Nikos
          P.S. I cropped the time series from the post above to keep the comment size small!

          Reply
          1. Valery

            Hi Nikolaos,

            Sure, I have plenty of such data. Just uploaded a chunk of it here:
            https://drive.google.com/file/d/0B_ifQOusmJeFX3FzUVdHSnV5TG8
            I use 28 days forecasts and obviously try to minimize stock-outs so for the forecast accuracy evaluation I select time series that contain at least 28 days long period with avail=1. In other words, I try to predict demand in case everything is on stock.

            Best
            Valery

  27. Pepijn

    Dear Nikos,

    I am currently working on a simulation optimisation of a replenishment problem with stochastic intermittent seasonal (lumpy sometimes) demand.
    What I intend to do is to generate stochastic intermittent demand in excel, however I am unsure about some things:

    * Should I only use the period between the first and the last demand to estimate the ADI and CV2 with the package? (leaving the period before and after this out, where the demand is 0)
    * would it be appropriate to simulate a demand frequency with the ADI and standard deviation of the ADI assuming a normal distribution, e.g. IF norm.inv( rand() , ADI , st.dev. ADI) 0 then norm.inv( rand() , ADI , st.dev. ADI). And then by doing the same for the demand size?

    I hope to hear from you!

    Regards
    Pepijn

    Reply
    1. Nikos Post author

      Hi Pepijn,
      The way I see it seasonal intermittent demand is an oxymoron. If it is seasonal, then the demand event timing is known (it is seasonal), so you are left with only one type of variation, the one on the demand size. In this case one could model the process with a regression with dummies. A few colleagues have supported the idea of seasonal intermittency, but I have to yet see a convincing process for it. If the seasonality is on the arrival rate, then you would need to simulate your data using a Poisson regression with a time varying (in a seasonal manner) arrival rate. So unfortunately I cannot answer your questions in a helpful way, as I do not think that simulating demand like that is helpful. Nonetheless, for your first question you do not need to edit your data, just use all. For your second question, I would suggest you check the simID function in the package. That should give you a few ideas.
      Best,Nikos

      Reply
      1. Pepijn

        Dear Nikos,

        thank you for your reply!

        I understand what you mean, but when I look at the data, I find that between June – september the demand has a very irregular and had weeks of no demand followed by peaks of demand or more intermittent-like lower demand, if I would be able to zoom in more (which I unfortunately can’t because I have data on a week basis) the demand (in those months where product is sold) would be even more intermittent.

        Would you still argue that it would be unwise to model the demand as intermittent/lumpy for those months in which I observe demand?

        Fitting distributions so far has not worked..

        regards
        Pepijn

        Reply
  28. Pepijn

    this is, for example, the demand of a store during the summer months (per week), and demand in the months afterwards is 0
    0
    0
    0
    8
    0
    0
    4
    2
    0
    1
    1
    0,0
    1,0
    2
    0,0
    2
    1
    1
    3
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0

    Reply
    1. Nikos Post author

      I see. Is this pattern similar over the years? Surely standard intermittent demand methods will be unable to deal with this case. I do not think that there is any standard method that deals with such data, so this is a case that something new should be trialed. To your question, is it unwise to model it as intermittent/lumpy demand? I do not know, but I think I would still try to take advantage of the seasonal aspect in a very explicit way, using for example a regression.
      Interesting problem!

      Reply
  29. Irene

    Hi Nikos,

    a short question: is a function like accuracy from the forecast package planned for tsintermittent? Thank you.

    Reply
    1. Nikos Post author

      Unfortunately not. Though I am in the process of redesigning the tsintermittent package to make it combatible with the logic of the forecast package (and thief, hts, MAPA, smooth, etc…). I hope in the coming weeks to have finished this and post an update!

      Reply
  30. Nadav Zivelin

    Dear Nikolaos,

    I will also echo the many folks, from the community for such a great package.
    What I have been struggling with is not analyzing sparse demand but generating a similarly sparse forecast.
    For example
    0
    0
    500
    0
    0
    450
    0
    0
    500
    0
    0
    450

    I would like to see a forecast that isn’t ~160 every period but instead has two periods of 0 followed by ~475..

    Any insights would be appreciated.
    Thank you and Regards
    Nadav

    Reply
    1. Nikos Post author

      Hi Nadav,

      The problem in producing such a forecast, in the context of interemittent demand, is that the timing of the demand event is uncertain. Intermittent demand is characterised by uncertainty both in the demand size and demand interval. If the uncertainty on the interval is minimal then it is be possible to produce such as forecast, and probably regression would be a good way to do it, where the periods of zero demand can be easily modelled. For example, in your case you could build a regression that the intercept is set to zero and you have a dummy variable to indicate the demand event, where the regression forecast would return something like: 0, 0, 475, 0, 0, 475, … Alternatively you could approach this problem as a seasonal forecasting problem, where an additive exponential smoothing would probably do the trick.

      May I ask, what application produces that type of data?

      Reply
      1. Nadav Zivelin

        Hi Nikos,
        Retail orders from manufacturer often take such a pattern, although less clean than the example provided.
        The retailer has a specific inventory threshold that triggers an order which is usually a given lot size.
        Manufacturer’s have ready access to the order patters but little insight to inventory levels or buyer habits.
        A systematized way of analyzing many such time series could help manufacturer’s plan better from a production perspective as well as warehouse stocking.

        Regards
        Nadav

        Reply
        1. Nikos Post author

          Thanks! My response was along the lines of a specific data generating process, but this does not seem to be the case here. In this case you may find this work quite relevant. When you consider forecasting across multiple tiers of the supply chain it is quite beneficial to build models that have both views, if point-of-sale information is available.

          Reply
          1. Nadav Zivelin

            The link is intriguing and I will have to delver into it deeper.
            When I first started in supply chain in the year 2000, readily available point of sale data was coming anytime now. Looking at larger markets such as the US and Asia very little gain has been made by manufacturers. They have access to POS data for some retailers and not other and it takes considerable effort and cost to get the information.
            Other areas such as Australia seem to be the exception and POS data looks to be readily available for most major retailers.

            One followup question, for a model such as Croston’s is the frequency of non-zero demand and the magnitude or say amplitude of the non-zero demand readily available outputs?

            Thanks
            Nadav

          2. Nikos Post author

            Yes! Check the $components output.
            With regards to POS, this is a good point. We found that we did not need all the POS data from different retailers to achieve the gains, and only the major retailers were enough.

  31. Peter

    Hi Nikos

    Which initialization method is used, when I use the Crost() function i R ?
    And how are the weights optimized ?

    Reply
  32. Vish

    Hi Nikolaos,

    Great package! Many thanks for sharing.
    I am trying to forecast it but facing some problem.
    This is for e.g. my dataset and I am trying to use the package.
    Date Expenses
    01-07-2006 18
    01-08-2006 0
    01-09-2006 0
    01-10-2006 0
    01-11-2006 30
    01-12-2006 15
    01-01-2007 0
    01-02-2007 18
    01-03-2007 177.78
    01-04-2007 0
    01-05-2007 0
    01-06-2007 0
    01-07-2007 30
    01-08-2007 44.2
    01-09-2007 0
    01-10-2007 62.1
    01-11-2007 0
    01-12-2007 95.89
    01-01-2008 0
    01-02-2008 0
    01-03-2008 252.7
    01-04-2008 97.29
    01-05-2008 327.3
    01-06-2008 0
    01-07-2008 0
    01-08-2008 39
    01-09-2008 232.19
    01-10-2008 20.1
    01-11-2008 0
    01-12-2008 17.4
    01-01-2009 0
    01-02-2009 169.02
    01-03-2009 0
    01-04-2009 0
    01-05-2009 252.7
    01-06-2009 0
    01-07-2009 451.07
    01-08-2009 60
    01-09-2009 35.01
    01-10-2009 13.56
    01-11-2009 0
    01-12-2009 0
    01-01-2010 0
    01-02-2010 0
    01-03-2010 0
    01-04-2010 577.02
    01-05-2010 0
    01-06-2010 0
    01-07-2010 9
    01-08-2010 78.68
    01-09-2010 67.5
    01-10-2010 15
    01-11-2010 0
    01-12-2010 0
    01-01-2011 0
    01-02-2011 0
    01-03-2011 0
    01-04-2011 36.59
    01-05-2011 40.5
    01-06-2011 60
    01-07-2011 30
    01-08-2011 0
    01-09-2011 37.5
    01-10-2011 37.5
    01-11-2011 30
    01-12-2011 210
    01-01-2012 127.5
    01-02-2012 37.5
    01-03-2012 89.23
    01-04-2012 460.78
    01-05-2012 903.9
    01-06-2012 506.9
    01-07-2012 0
    01-08-2012 0
    01-09-2012 0
    01-10-2012 265.01
    01-11-2012 0
    01-12-2012 3054.4
    01-01-2013 612.73
    01-02-2013 1304.3
    01-03-2013 0
    01-04-2013 0
    01-05-2013 520.5
    01-06-2013 995.24
    01-07-2013 716.72
    01-08-2013 569.53
    01-09-2013 120.25
    01-10-2013 578.43
    01-11-2013 52.5
    01-12-2013 97.5
    01-01-2014 120
    01-02-2014 30
    01-03-2014 0
    01-04-2014 0
    01-05-2014 199.17
    01-06-2014 45
    01-07-2014 305.51
    01-08-2014 457.12
    01-09-2014 0
    01-10-2014 0
    01-11-2014 46.14
    01-12-2014 0
    01-01-2015 0
    01-02-2015 0
    01-03-2015 15
    01-04-2015 0
    01-05-2015 30
    01-06-2015 1588.93
    01-07-2015 0
    01-08-2015 17.4
    01-09-2015 0
    01-10-2015 0
    01-11-2015 5631.07
    01-12-2015 0
    01-01-2016 0
    01-02-2016 0
    01-03-2016 37.5
    01-04-2016 809.2
    01-05-2016 465.39
    01-06-2016 0
    01-07-2016 0
    01-08-2016 156.92
    01-09-2016 318.01
    01-10-2016 678.51
    01-11-2016 0
    01-12-2016 0
    01-01-2017 4372.84
    01-02-2017 0
    01-03-2017 0
    01-04-2017 507.05
    01-05-2017 112.5
    01-06-2017 0
    01-07-2017 0
    01-08-2017 37.5
    01-09-2017 0
    01-10-2017 0
    01-11-2017 0
    01-12-2017 0
    01-01-2018 1135.78
    01-02-2018 489.48
    01-03-2018 39
    01-04-2018 0
    01-05-2018 0
    01-06-2018 1048.76
    01-07-2018 78
    01-08-2018 0
    01-09-2018 0
    01-10-2018 0
    01-11-2018 1144.04
    01-12-2018 276
    01-01-2019 0
    01-02-2019 0
    01-03-2019 1161
    01-04-2019 0
    01-05-2019 1759.37
    01-06-2019 97.5

    Reply
    1. Nikos Post author

      Thanks! It appears your question got truncated, can you please repost your question?

      Reply
  33. Sanket

    Hi Nikolaos,

    I was trying to understand the difference between the results produced by TSB and Croston. I believe Croston returns expected demand per period whereas TSB returns a point forecast. I have 2 questions here –

    1) Assuming I want the forecasts for the next 5 periods and TSB function returns 10 units as forecast. Should I infer that in each forecast period I can expect a demand of 10 units? Hence, at the end of 5 periods, I will be expecting a demand of 50 units

    2) How to incorporate seasonality and trend into Croston and TSB? One way I read was to use (S)ARIMA or Holt-Winters. I was wondering if it is possible to use Holt-Winters instead of simple exponential smoothing (SES) in Croston or TSB?

    Reply
    1. Nikos Post author

      1) I interpret both as having the same style of output, so the forecast for each period is the units per period, and across the lead time you need to take the sum.
      2) That has been a question that has been bugging me for a while myself. I think now we have a reasonable answer to this. Have a look at some of my later work.

      Reply
  34. Satya Pattnaik

    Hi Nikos,
    Can you give some more information about the KHa and KH categorization.
    Is it based on the equation :
    v > 2-3/2p, (Then SBa else Croston)
    where v is coefficient of variation of non zero demand squared and p is average inter-demand interval

    Reply
    1. Nikos Post author

      It is from a note by Kostenko and Hyndman (you can find the detail here), which is an exact solution of Syntetos and Boylan’s work that uses the separation margin you mention. In KH and KHa that margin is considering the smoothing parameter as well. Practically speaking the difference between the schemes is not that severe. We have extended this a bit to exclude series that have very low intermittency, but eventually I think there is a better way to handle the problem.

      Reply
  35. Stephanie

    Hi Nikolaos! I’m a beginner in R

    What I have: Sales_date.csv contains the historical daily unit sales data per product ( 3 colums: Product name, Demand, Date)
    What I want to do: categorize the demand off all the products into 4 groups (lumpy, intermittent, smooth, erratic) by using two factors: (i) average demand interval (ADI) cut off value 1.32 and (ii) square coefficient of variation of demand sizes (CV2), cut off value 0.49.
    Then allocate suitable forecasting method to each group (Croston or SBA method or single exponential smoothing)
    I’ve tried using tsintermittent package in RStudio based on the code from https://github.com/trnnick/tsintermittent/blob/master/R/idclass.R. (fuction idclass to categorize the demand), but it did not work.
    It would be highly appreciate if you could please guide me the right way to use the code in my case

    Thank you very much in advance

    Reply
    1. Nikos Post author

      Hi Stephanie!

      You will need to arrange the data in a more time-series friendly way, where each column will be a time series. Date information is not important for this, but it will not complain if it is there. So what you want is each column to be a product, and each row to be a demand figure. Then idclass should be able to work.
      Since you are a beginner in R, you may find it helpful to master the data types a bit, especially the ts() class that most forecasting algorithms use.

      On another note though, I still find this classification very handy for understanding my data, but not so useful for forecasting, once you are optimising your forecasting models, so depending what you are trying to do, if this gets too fussy, you may just skip it!

      Reply
  36. Muhaimin

    Dear Prof. Nikolas,

    I have a questions that bothered Me all the time. Does the intermittent data have to have a very strong autocorrelation? Thank you in advance.

    Sincerely
    Muhaimin

    Reply
    1. Nikos Post author

      A very good question, for which I do not have a simple yes/no answer.

      Can we calculate autocorrelation on intermittent data? Sure we can, but it doesn’t seem to give us the intuition we expect, as implicitly it tells us the information on an assumed ARIMA structure (that is not from the definition of autocorrelation, but from the practicalities of it!). Is an ARIMA valid for intermittent data? We typically say no, as the demand will often be characterised by non-negative low count integer values. In that sense an Integer Valued ARMA might be more fitting (INARMA; there is some work on INARMA and intermittent series). So that would suggest that our typical autocorrelation does not apply to intermittent series. On the other hand, if we consider the information contained in the time series, and temporally aggregate it (see here) at some point it will behave more like a normal time series, at which point autocorrelation will be valid and usefull. Where does this leave us? My intuition is that although we could calculate it, to me it appears it would be misleading. Is there in a looser term possibility of autocorrelation? Sure there is, and we can observe it at temporally aggregate views of the intermittent time series.

      So to your question, does intermittent daya have to have a very strong autocorrelation. I don’t think there is such a requirement, and even if there was, I personally wouldn’t know what to do with it!

      Reply
      1. Muhaimin

        Thank you very much, sir. I’m working on my thesis and dealing with intermittent time-series data. Introduce me is a student at NCTU-Taiwan. Once again, thank you very much for your answers, sir. Very helpful.

        Sincerely
        Muhaimin

        Reply

Leave a Reply to Nikos Cancel reply

Your email address will not be published. Required fields are marked *