New R package nnfor: time series forecasting with neural networks

By | October 25, 2017

My new R packageĀ nnfor is available on CRAN. This collects the various neural network functions that appeared in TStools. See this post for demo of these functions. In summary the package includes:

  • Automatic, semi-automatic or fully manual specification of MLP neural networks for time series modelling, that helps in specifying inputs with lags of the target and exogenous variables. It can automatically deal with pre-processing (differencing and scaling) and identify the number of hidden nodes. The user can control which of these settings are left on automatic or not.
  • A few options for building network ensembles.
  • Plotting functions of the network topology, fit and forecast.
  • All the above for ELMs (Extreme Learning Machines).
  • Support for Temporal Hierarchies Forecasting, with the thief package for R.

This builds on the neuralnet package for R, and provides the code to make the networks capable of handling time series data automatically. Although that package is quite flexible, it is computationally expensive and does not permit for deep learning. The plan is to eventually implement such capabilities in the package.

There are numerous papers that support the ideas used to put together this package:

  • In my new book, Ord et al., 2017, Principles of Business Forecasting, 2e, Wessex Press Publishing. Chapter 10 describes the basic logic in building MLP networks for time series forecasting. This package implements the logic described there.
  • This paper demonstartes the performance of the input variable selection algorithm: Crone and Kourentzes, 2010, Feature selection for time series prediction – a combined filter and wrapper approach for neural networks. Neurocmputing, 73, 1923-1936. There is some influence from this proceedings paper as well. (These feel like really old papers!)
  • This paper looks at the combination operator for the ensembles. Please move away from the average! Kourenztes et al., 2014, Neural network ensembles operators for time series forecasting. Expert Systems with Applications, 41, 4235-4244.

The neural network functions in TStools will be removed, initially pointing towards this package and latter removed completely.

There is a github repository for this, where I will be posting updates and fixes till they go on CRAN: https://github.com/trnnick/nnfor

Happy (nonlinear) forecasting!

14 thoughts on “New R package nnfor: time series forecasting with neural networks

  1. Alex Raboin

    I have been following your work and was excited to see you made progress on getting this on cran.

    I am having two issues with switching from nnetar to mlp…

    One, I am getting an error for non conforming arguments….

    and two, how do you implement forward regressors?

    In nnetar i would cbind the time series objects of forward regressors into freg, then use xreg = freg in the forecast argument.

    I am looking forward to upgrading to using this and figuring how to use thief too, seems impact in what you have been working on.

    and I am hopeful I will see similar results in the short term natural gas load forecasting I am doing.

    Thanks for the help. Cheers!

    Reply
    1. Nikos Post author

      First, I found a small bug that only appears when there are no univariate lags and the modeller wants to turn off automatic input selection. I put up a github repository here: https://github.com/trnnick/nnfor. The bug is fixed there.

      I wrote a R example to show how to handle xreg inputs with mlp(). There are two options. Either input the dynamics in the xreg.lags or create an xreg where each column is the appropriate dynamic manually and set xreg.lags to zeros. However, I have not implemented leads (forward regressors) yet, so you will have to go with option two. The following example should be helpful.

      library(nnfor)
      
      # The objective is to forecast the Airline Passengers series with only deterministic trend and seasonality
      # mlp does the deterministic seasonality internally, when needed, but not the trend.
      
      # Let us prepare some data
      y <- AirPassengers
      h <- 2*frequency(y)
      tt <- cbind(c(1:(length(y)+h),rep(0,2*h)))
      # Observe that the deterministic trend ends with zeros
      print(tt)
      
      # Fit a network with no differencing, no univariate lags, and fixed deterministic trend
      fit1 <- mlp(y,difforder=0,lags=0,xreg=tt,xreg.lags=list(0),xreg.keep=TRUE)
      print(fit1)
      plot(fit1)
      plot(forecast(fit1,h=h,xreg=tt))
      # The forecast is reasonable
      
      # Now let us shift the input so that the zeros are in the forecast period
      tt2 <- tt[-(1:h),,drop=FALSE]
      plot(forecast(fit1,h=h,xreg=tt2))
      # The seasonality is there, but there is zero trend, as the inputs suggest. 
      # Also note that the mlp modelled multiplicative seasonality on its own. NNs are cool. 
      
      # Now let us fit a network on the shifted inputs
      # I will ask for outplot=1 to see the model fit
      fit2 <- mlp(y,difforder=0,lags=0,xreg=tt2,xreg.lags=list(0),xreg.keep=TRUE,outplot=1)
      plot(forecast(fit2,h=h,xreg=tt2))
      # Same as before
      
      # Now lets fit with two inputs, the shifted (lead of 24 periods) and the original trend
      fit3 <- mlp(y,difforder=0,lags=0,xreg=cbind(tt[1:192,,drop=FALSE],tt2),xreg.lags=list(0,0),xreg.keep=list(TRUE,TRUE),outplot=1)
      print(fit3)
      plot(forecast(fit3,h=h,xreg=cbind(tt[1:192,,drop=FALSE],tt2)))
      # The network gets a bit confused with one of the trend vectors stopping!
      

      Hope this helps!

      Reply
      1. Alex Raboin

        Thanks! Very helpful. I’m training away. What are your thoughts on lead inputs (forward regressors) ?

        Computational time is killing me, With so many networks to train for an ensemble model (84) I’ve set to train all day then save models, load then forecast the next morning. I have a rolling different dates to look at….
        I should be able to utilize a gpu to speed this up right? Ive been looking at a few R packages with CUDA. I’m uncertain if I can actually utilize the hardware. I have been looking through some documentation but nothing definitive yet.

        I appreciate your input you an are expert, I am still working to perfect my forecasting application with the latest and greatest.

        Thank you!

        Reply
        1. Nikos Post author

          The computational time is a big issue. At the core of the nnfor, for now, I am calling the neuralnets package, which unfortunately does not use the GPU, and relies on CPU. This makes training rather slow. A potential way forward might be to use ELMs with the elm() function. The argument barebone=TRUE for elm() may help as well. On my todo list for the package is to implement an alternative core that will make use of GPU to speed up processing and also allow experimenting with deep learning (though I am currently very skeptical if it is needed for time series forecasting, at least I am skeptical for now!) – when I run large neural network experiments I use the neural networks toolbox in MatLab, which is very fast, but unfortunately not free or open source.

          Using regressors with leads can be an useful at times. The typical restriction is that we often do not have information about a regressor for the future periods, so that is one of the reasons you do not see this happening often in examples. I have used leads quite a bit in promotional modelling, where you know the promotional plan in advance for the future periods, and therefore this information is available and known with certainty. In that particular case leads are very useful. I would suspect that if you know your regressors with some confidence then they should be helpful as well. If your regressors are forecasted, then you carry any forecast errors in your new forecast that uses them as inputs – this may be a mixed bag, depending on how good is the forecast of the regressors.

          Reply
          1. Josh

            Hi Nikos,

            Thanks for your work here. I’m very interested in neural network applications in time series. I was curious if you’ve seen the work being done with LSTMs and time series? Here’s a link to some work being done at Uber regarding utilizing autoencoders to capture time series properties:
            https://arxiv.org/abs/1709.01907

            Also this recent Kaggle winner applied similar techniques to win a forecasting competition:
            https://www.kaggle.com/c/web-traffic-time-series-forecasting/discussion/39367

            I’m interested to see what you discover for deep learning applications.

          2. Nikos Post author

            Hi Josh,
            Many thanks for the links. I will read them in detail. Quickly skimming through the first paper I noticed that there are no simple statistical benchmarks (something as simple as exponential smoothing!) in the empirical evaluation. It is a shame, as this would help make NNs and deep learning more mainstream in the forecasting literature, but also help highlight whether the gains are over NNs or more general. These two literatures need to talk more to each other šŸ™‚ When I get the time, I will investigate more and post my findings!

  2. Alex Raboin

    I too am interested to see how LSTM can help me out. Nikos…When I use the second example you provided?

    Is the model forecasting based on the forward leads I put into the model?

    My leads are forecasted weather variables so yes as you mentioned the forecasted input carries into the forecast error, but I am thinking day of and tomorrow are going to be okay, where as 4 plus days out is iffy.

    Thank you for your help!

    Reply
    1. Nikos Post author

      That’s correct the model has fitted and forecasts on the leads that you have placed. I will need some time to get my programming hands into LSTM – end of term is a busy period at my uni :/

      Reply
      1. Eric

        Thanks Nikos. Just to say another random internet reader I had similar questions about LSTM when reading your review of Prophet. (Also I wonder how STLPlus package would compare.)
        Thanks

        Reply
  3. Pingback: Another look at forecast selection and combination: evidence from forecast pooling – Forecasting

  4. Hernan

    Hi Nikos,

    thanks for your outstanding work. I have been playing for a while with the nnfor/TStools packages and they are great, I’ve learned a lot.

    I want to ask you a few questions regarding (double) seasonal time series forecasting.

    nnfor’s references lead to ‘Feature selection for time series prediction – A combined filter and wrapper approach for neural networks’, which is a very educative paper. My question has to do with regard to the coding of seasonality with respect to a seasonal differentiated series. In your article, you mention that NNs should be trained with the original undifferentiated series that contains all the original patterns and the deterministic seasonal coding. However, when I differentiate the series (i.e. difforder = c(24, 168) for daily and weekly seasonality as in Taylor’s (2006) work), these dummy variables are included anyway along with the differentiated series and its respective lags, mixing the deterministic seasonality approach with the seasonal differences. I don’t know if I’m getting this right, because when I use the set the allow.det.season argument to FALSE (to use only a seasonal differencing approach), it includes the fourier terms anyway. Is this some kind of bug or Is it theoretically feasible to do that?

    Thank you for your help!

    Best regards,

    Reply
    1. Nikos Post author

      Hi Hernan,

      If you select allow.det.season=FALSE it should exclude the deterministic inputs. Are you using the latest version (on Github)? If yes, can you send me the command you are using? Nonetheless here are some comments on what you are doing. Differencing the time series would indeed assume a very specific stochastic encoding of seasonality. This may encode the seasonality fully or not. The addition of the trigonometric formulation would take care of any residual seasonality if any. However, differencing wastes substantial fitting sample (a whole “large” season), so it might be worthwhile to compare whether differencing is indeed useful!

      Best,
      Nikos

      Reply
      1. Bastian

        Hi Nikos,

        thank you for your answer, you are very kind.

        This is a reproducible example using the UK load series included in the forecast package to train an ELM as a NAR model.

        library(nnfor)
        packageVersion(‘nnfor’) # ‘0.9.3’

        y.train <- taylor # from required 'forecast' package

        input.lags <- 1:48

        elm.model <- elm(y = y.train,
        m = 48,
        hd = NULL, # hidden nodes hidden layer | NULL (auto)
        type = 'step', # output layer weights estimation
        reps = 10,
        comb = 'mode', # mean|median|mode(kde)
        lags = input.lags, # default:= 1:frequency(y), 0:= no lags
        sel.lag = TRUE, # TRUE|FALSE
        keep = rep(TRUE, length(input.lags)),
        difforder = c(48,336),
        outplot = FALSE,
        direct = FALSE,
        allow.det.season = FALSE, # !!!!!!!!!!!!!!!!!!!!!!
        #det.type = "trg",
        xreg = NULL,
        xreg.lags = NULL,
        xreg.keep = NULL,
        barebone = TRUE # faster!!
        )

        elm.model

        plot(forecast(elm.model, h = 336)) # beautiful!

        ###############
        ELM (fast) fit with 31 up to 37 hidden nodes and 10 repetitions.
        Series modelled in differences: D48D336.
        Univariate lags: (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48)
        Deterministic seasonal dummies included. # <————————— !!!!!!!!!!!!!!!!!!!
        Forecast combined using the mode operator.
        Output weight estimation using: step.
        MSE: 49879.4212.
        ###############

        I will follow your recommendation and compare those two approaches (det. seasonality and stochastic encoding).
        As I mentioned before, I know from Taylor(2006) that the stochastic encoding can be used to obtain forecasts in a iterative multiple step-ahead scheme, but deterministic encoding makes more sense to me. I think that if I use a large dataset maybe the effect of wasting a season of data (a week) can be unnoticed.

        If it does not bother you, I would like to take the opportunity to ask you a couple more things.
        – If a rolling forecast origin evaluation is considered, would you recommend fit from scratch or reuse/retrain the trained model?
        – Do you have any particular reason to prefer neuralnet package over other available NN packages (i.e. nnet) ?

        You can not imagine how grateful I am.

        Best,
        Bastian
        (Google's autocomplete changed my name)

        Reply
        1. Nikos Post author

          Hi Bastian, thanks for the example. This should be fixed now and is available on github.
          There was an issue with msts objects and an if-statement which is now corrected.
          Thanks for spotting and reporting this!

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *