Forecasting time series with neural networks in R

By Nikos | February 10, 2017

I have been looking for a package to do time series modelling in R with neural networks for quite some time with limited success. The only implementation I am aware of that takes care of autoregressive lags in a user-friendly way is the nnetar function in the forecast package, written by Rob Hyndman. In my view there is space for a more flexible implementation, so I decided to write a few functions for that purpose. For now these are included in the TStools package that is available in GitHub, but when I am happy with their performance and flexibility I will put them in a package of their own.

Here I will provide a quick overview of what these is available right now. I plan to write a more detailed post about these functions when I get the time.

For this example I will model the AirPassengers time series available in R. I have kept the last 24 observations as a test set and will use the rest to fit the neural networks. Currently there are two types of neural network available, both feed-forward: (i) multilayer perceptrons (use function mlp); and extreme learning machines (use function elm).

# Fit MLP
mlp.fit <- mlp(y.in)
plot(mlp.fit)
print(mlp.fit)

This is the basic command to fit an MLP network to a time series. This will attempt to automatically specify autoregressive inputs and any necessary pre-processing of the time series. With the pre-specified arguments it trains 20 networks which are used to produce an ensemble forecast and a single hidden layer with 5 nodes. You can override any of these settings. The output of print is a summary of the fitted network:

MLP fit with 5 hidden nodes and 20 repetitions.
Series modelled in differences: D1.
Univariate lags: (1,3,4,6,7,8,9,10,12)
Deterministic seasonal dummies included.
Forecast combined using the median operator.
MSE: 6.2011.

As you can see the function determined that level differences are needed to capture the trend. It also selected some autoregressive lags and decided to also use dummy variables for the seasonality. Using plot displays the architecture of the network (Fig. 1).

Fig. 1. Output of plot(mlp.fit).

The light red inputs represent the binary dummies used to code seasonality, while the grey ones are autoregressive lags. To produce forecasts you can type:

mlp.frc <- forecast(mlp.fit,h=tst.n)
plot(mlp.frc)

Fig. 2 shows the ensemble forecast, together with the forecasts of the individual neural networks. You can control the way that forecasts are combined (I recommend using the median or mode operators), as well as the size of the ensemble.

Fig. 2. Output of the plot function for the MLP forecasts.

You can also let it choose the number of hidden nodes. There are various options for that, but all are computationally expensive (I plan to move the base code to CUDA at some point, so that computational cost stops being an issue).

# Fit MLP with automatic hidden layer specification
mlp2.fit <- mlp(y.in,hd.auto.type="valid",hd.max=10)
print(round(mlp2.fit$MSEH,4))

This will evaluate from 1 up to 10 hidden nodes and pick the best on validation set MSE. You can also use cross-validation (if you have patience…). You can ask it to output the errors for each size:

        MSE
H.1  0.0083
H.2  0.0066
H.3  0.0065
H.4  0.0066
H.5  0.0071
H.6  0.0074
H.7  0.0061
H.8  0.0076
H.9  0.0083
H.10 0.0076

There are a few experimental options in specifying various aspects of the neural networks, which are not fully documented and is probably best if you stay away from them for now!

ELMs work pretty much in the same way, although for these I have made default the automatic specification of the hidden layer.

# Fit ELM
elm.fit <- elm(y.in)
print(elm.fit)
plot(elm.fit)

This gives the following network summary:

ELM fit with 100 hidden nodes and 20 repetitions.
Series modelled in differences: D1.
Univariate lags: (1,3,4,6,7,8,9,10,12)
Deterministic seasonal dummies included.
Forecast combined using the median operator.
Output weight estimation using: lasso.
MSE: 83.0044.

I appreciate that using 100 hidden nodes on such a short time series can make some people uneasy, but I am using a shrinkage estimator instead of conventional least squares to estimate the weights, which in fact eliminates most of the connections. This is apparent in the network architecture in Fig. 3. Only the nodes connected with the black lines to the output layer contribute to the forecasts. The remaining connection weights have been shrunk to zero.

Fig. 3. ELM network architecture.

Another nice thing about these functions is that you can call them from the thief package, which implements Temporal Hierarchies forecasting in R. You can do that in the following way:

# Use THieF
library(thief)
mlp.thief <- thief(y.in,h=tst.n,forecastfunction=mlp.thief)

There is a similar function for using ELM networks: elm.thief.

Since for this simple example I kept some test set, I benchmark the forecasts against exponential smoothing:

Method	MAE
MLP (5 nodes)	62.471
MLP (auto)	48.234
ELM	48.253
THieF-MLP	45.906
ETS	64.528

Temporal hierarchies, like MAPA, are great for making your forecasts more robust and often more accurate. However, with neural networks the additional computational cost is evident!

These functions are still in development, so the default values may change and there are a few experimental options that may give you good results or not!

53 thoughts on “Forecasting time series with neural networks in R”

Dmitrii May 3, 2017

Hello Nikos, great post, thank you! Can I ask you some questions please:
1. Do you know, to what extent mlp{TStools} differs from mlp{RSNNS} or they essentially use a similar technique?
2. Does mlp{TStools} need prior scaling of input time series and/or exogenous regressors, or it scales them automatically?
3. Is there any maximum number of exogenous regressors that can be included?
Thank you!

Reply ↓
1. Nikos Post authorMay 3, 2017
  
  Hello! They are quite different in that mlp{TStools} does all the preprocessing and model setup for time series forecasting automatically. So it takes care of scaling of the target and exogenous variables, differencing, it can do automatic selection of input lags and hidden nodes, unless otherwise specified, introduce seasonal dummy variables (if needed) and so on. The mlp{RSNNS} offers functionality to build and train a neural network, but you would have to do all the preprocessing manually. In some sense the mlp{TStools} is build in the same philosophy as auto.arima, where the user does not have to worry about preprocessing the data or specifying the model details. There is no maximum number of regressors it can accommodate, but I should say that it is not very fast in training large neural networks (especially given that the output is ensemble based by default). For that GPU based neural networks would be idea, which I have not implemented in R yet.
  
  Reply ↓
  1. Dmitrii May 5, 2017
    
    Thank you for reply, Nikos.
    I have recently read your article “Segmenting electrical load time series for forecasting? An empirical evaluation of daily UK load patterns”, where you wrote that you had trained your MLPs using Levenberg-Marquardt function and eventually you received quite low MAPEs. Is it something that is implemented within mlp{TStools}? And may I ask what R function did you use at that time for this analysis?
    In your research you came to the conclusion that forecasting entire time series is better than forecasting decomposed series. Would you suggest using for mlp{TStools} for forecasting consecutive electricity load with 100,000+ observation, or its computational capacity is unable to capture such a big dataset and it is worth using mlp{TStools} for forecasting decomposed time series?
    Thank you!
    
    Reply ↓
    1. Nikos Post authorMay 5, 2017
      
      That analysis was done in MatLab. I do not expect that there will be too much difference due to the specific training algorithms.
      100k time series will take a lot of time to train with most neural network implementations in R. mlp in TStools will most surely be slow… go for lunch, coffee and a nice walk while calculating slow. For such massive datasets you need very efficient implementations that make use of your GPU (assuming your graphics card is CUDA capable). Matlab allows that, but there are also options for R, but essential connect with external implementations (for e.g. java).
      Unfortunately, neural networks are still not trivial to use and require the computer skills to put together a solution programmatically.
      
      Reply ↓
Daniel May 24, 2017

Hello Nikos,

Let’s see if you could give me a hand on this. Im trying to do a prediction algorithm on mechanical failures. I have the data and my doubt comes on how to implement it. Is it a time series? I know I could use survival analysis but that’s just statistics, I want to use ML so I thought to use NN and that’s when i came to your article. What you say?
Thanks in advance 😉

Reply ↓
1. Nikos Post authorMay 25, 2017
  
  Hi Daniel,
  Interesting question; as you suggest typically people model this as a survival analysis problem. What kind of variable do you have available? (and software restrictions?)
  Nikos
  
  Reply ↓
Holger July 12, 2017

Hi,

I like the forecast package, but I saw it is limited to Aroma when you want to include exogenous variables. Can you include those here? Neural networks should be ideal for that problem …

Holger

Reply ↓
1. Nikos Post authorJuly 13, 2017
  
  You can, there is an xreg argument to help you do that. Just keep in mind the code is still beta, hence not on CRAN yet!
  
  Reply ↓
vivek August 1, 2017

Hi Nikos,

Can we use TStools or THief package for intermittent data too?

Vivek

Reply ↓
1. Nikos Post authorSeptember 16, 2017
  
  Yes! You would use tsintermittent and thief packages and or the thief forecast you would need to use the forecast function argument. For examples of how to do this have a look on the smooth.thief function in TStools.
  
  Reply ↓
sonia October 1, 2017

Hello Nikos,
Hope you’re fine…i’ve question about fitting of neural network in forecast package for sunspot.year data.I can’t understand how to specify neural network of order 4x4x1…pl reply me soon…waiting for your response

Reply ↓
1. Nikos Post authorOctober 2, 2017
  
  Hi Sonia, I think what you are looking for is the argument hd, which should be hd=c(4,4). The single output layer is implied. So you would write something along the lines: mlp(data,hd=c(4,4),…). Hope this helps!
  
  Reply ↓
  1. Rohan June 24, 2018
    
    Hi Nikos,
    I tried hd=c(4,4) which works in MLP but not in ELM. For ELM I keep getting the error:
    Error in w.out[2:(1 + hd), , drop = FALSE] : subscript out of bounds
    In addition: Warning message:
    In 2:(1 + hd) : numerical expression has 2 elements: only the first used
    
    How do I trouble shoot?
    
    Reply ↓
    1. Nikos Post authorJune 26, 2018
      
      Strange, I do not get this error. However, elm is not supposed to work with small hidden layers. What you need is a very large layer. For example by default I assign it 100 neurons.
      
      Reply ↓
saarbaan October 13, 2017

hi sir ,
i m a new learner in ANN model in my study i m using R to forecast the time series data of inflation .but i dont know how to use ANN mdel in basic. after
reading the data in R and converting them in time serries, then how to bring in ANN model for further process

Reply ↓
1. Nikos Post authorOctober 14, 2017
  
  Once the data is a time series then you could build a very basic model just by letting everything on the default settings. So if your time series is in variable y, you could just write mlp(y). I do not expect this to give any good forecasts for infation though! If you are new to ANNs, my suggestion would be to first familiarise yourself how to use them on relatively simpler data!
  
  Reply ↓
  1. saarbaan October 19, 2017
    
    Thanks a lot sir for your precious ideas , and suggestion regarding my study ,
    Actually my study on forecasting inflation (cpi) rate using some almost 50 years data in annually . And comparing Arima model and ANN model, A
    now Arima model is understandable in R , the problm is ANN model where i m basic learner and its command in R confusing a lot ,
    So sir kindly any new suggestion regarding this or any book , paper where i can get help some how basically
    I will be your thankful,
    
    Reply ↓
    1. Nikos Post authorOctober 19, 2017
      
      This is book is covering various forecasting topics and ANN is one of them (chapter 10):
      https://wessexlearning.com/products/principles-of-business-forecasting-2nd-ed
      This is a very detailed book on Nueral Nets:
      https://www.amazon.co.uk/Neural-Networks-Learning-Machines-Comprehensive/dp/0131471392
      
      Reply ↓
      1. saarbaan October 20, 2017
        
        Thanks for suggestion and supporting;
2. Abeeha Ch January 20, 2018
  
  can you help me regarding this topic Ann
  
  Reply ↓
eugene October 31, 2017

Hi nikos,
I tried forecasting with the xreg componet, to account for days in the week seasonality. However, I encountered this error and I am trying to make some sense out of it. I have not encountered this problem before, although i have frequently being including xreg components in arima modelling.
Hope to get some advise! 🙂

> mlp.frc<-forecast(mlp.fit,xreg=forecastmatrix,h=151)
Error in forecast.net(object, h = h, y = y, xreg = xreg, …) :
Length of xreg must be longer that y + forecast horizon

dim(forecastmatrix)
[1] 151 6

As you can see I create this matrix to forecast 151 days in the future ~ 6 columns for the days of the week.

Reply ↓
1. Nikos Post authorNovember 9, 2017
  
  Hi Eugene,
  See my response to a similar question here: http://kourentzes.com/forecasting/2017/10/25/new-r-package-nnfor-time-series-forecasting-with-neural-networks/
  Also, if you are trying to account for deterministic seasonal effects, these are taken care of by the network automatically. You would need additional dummy inputs to capture frequencies different than the ones inputted in the network. Let me know if that worked!
  
  Reply ↓
manny November 6, 2017

What would this error imply? Can someone help

Error in if ((decomposition == “multiplicative”) && (min(y) <= 0)) { :
missing value where TRUE/FALSE needed

Reply ↓
1. Nikos Post authorNovember 9, 2017
  
  Hi Manny,
  Are you using the latest version? Use the one on gitub: https://github.com/trnnick/nnfor
  I think that bug is fixed there, if you get the same error still, please let me have the line of code you are trying to run.
  Cheers
  
  Reply ↓
  1. mannr November 23, 2017
    
    Yes, I am using the latest package. I have also shared a data sample with you. Wish you help me!!!
    
    Reply ↓
    1. Nikos Post authorDecember 10, 2017
      
      I think you are getting this because you are trying to run the networks in very short time series. I will program a slightly more understandable error message!
      
      Reply ↓
Abdul November 7, 2017

Hi Niko,
Thanks for the initiative.
How to include Xreg argument in forecast, it throws error while doing the exact same way as i do in “Forecast” package.
i get the following error

Error in forecast.net(object, h = h, y = y, xreg = xreg, …) :
Length of xreg must be longer that y + forecast horizon.

Thanks for your help!!

Reply ↓
1. Nikos Post authorNovember 9, 2017
  
  Hi Abdul,
  Have a look at my response in the comments here: http://kourentzes.com/forecasting/2017/10/25/new-r-package-nnfor-time-series-forecasting-with-neural-networks/
  It provides an example of how to use xreg. It is used a bit differently than in the forecast package. Let me know if you still get an error – hope this helps!
  
  Reply ↓
sagar November 20, 2017

mlp.fit <- mlp(x)
Error: could not find function "mlp"

Every time it is giving me above error. I have installed all the required package.

Reply ↓
1. Nikos Post authorDecember 10, 2017
  
  I cannot reproduce this error. I have used the package in multiple computers. Any additional information would be helpful!
  
  Reply ↓
Pankaj Joshi November 21, 2017

Hello Professor,

Hats off for your contributions.

Is it possible to use an existing model (from a previous call to elm) with new time series in the elm() function?

Thank you very much,
Pankaj

Reply ↓
1. Nikos Post authorDecember 10, 2017
  
  It now is! Took me a while to program it in, but now you can use the arguments `model’ and `retrain’ to reuse model with or without retraining the weights.
  Already available in github, I will push to CRAN as well.
  
  Reply ↓
Pingback: October 2017 New Packages – Cloud Data Architect
Maddy January 10, 2018

Hi Nikos,

Thank you very much for the article. I am new to the machine learning and neural net. Currently I am using ARIMA and HoltWinters model in R to do the batch forecasting for more than 5000 products. Can I do the batch forecasting using neural net in R? If yes, can you please publish or show the R syntax of similar example if you have any?

Thanks,
Maddy

Reply ↓
1. Nikos Post authorFebruary 1, 2018
  
  I think the current implementation of mlp in nnfor is way too slow for large scale forecasting. I have not found a native R package that is fats enough, but there are some toolboxes that can be called from R to the this. MXnet is one of them, but currently nnfor does not support it. In principle the answer to your question would be yes, but I think the computational speed would be too slow.
  
  Reply ↓
Skander Hannachi January 12, 2018

Dr. Kourentzes

Thank you for a very comprehensive and helpful blog.
I am curious, you don’t mention LSTM and RNN in this post, but according to my informal Google searches, they are the “go-to” family of neural nets for time series prediction, since they are very well suited for sequential data.

Have you investigated LSTM? Are there any formal results on LSTM, or are they just being hyped because they are part of the Deep Learning class of NNets and only getting attention because of the buzz?

Reply ↓
1. Nikos Post authorFebruary 1, 2018
  
  This is a good point. I have done some reading on LSTM and have used RNN in the past. To my experience for many business forecasting applications I have not seen the need to use these models and certainly have not seen the gains empirically. I have read quite a few papers on the topic, but they fall short of demonstrating what are the benefits, given the simplicity of MLPs. Saying that, there are applications that LSTM would be the natural choice. I intend at some point to do some further research on that, but I have not found the time yet. A large scale empirical evaluation adhering to the principles of forecasting would be very interesting to see. I would be hesitant to say it is just being hyped, but at the same time I would be very hesitant to say that it seems to be the most natural way to do time series forecasting for many aspects of business forecasting.
  
  Reply ↓
Shanu January 18, 2018

I tried to install nnfor-package however I got an error that this package is not available with R version 3.4.3. Which version of R supports the nnfor-package?

Reply ↓
1. Nikos Post authorFebruary 1, 2018
  
  This is peculiar, I am running the latest version with no problems.
  
  Reply ↓
Georgi February 1, 2018

Man. You do not have any knowledge how NN works. You overfit badly.

Reply ↓
1. SM September 2, 2018
  
  Hi Dr. Kourentzes,
  
  How can i find out :
  1) What training algorithm that you used for MLP, is it back propagation ?
  2) What type of activation function that you used?
  3) was there a bias used?
  
  Regards,
  SM
  
  Reply ↓
  1. Nikos Post authorSeptember 4, 2018
    
    The current version of nnfor is using the neuralnet package to do the training of the network, which uses rprop+ for training and sigmoid. Bias is used. I plan to implement interfacing with tensorflow to give more training capabilities.
    
    Reply ↓
    1. Christian Lopes June 23, 2019
      
      Hi Dr. Kourentzes,
      I’ve seen that the seasonal dummies have their own visual representation (red inputs). Also, as per your reply, bias have been used. How do I see that in the visualization? It is applied all the time?
      Regards,
      Christian
      
      Reply ↓
      1. Nikos Post authorJuly 8, 2019
        
        Hi Christian, that figure does not visualise the bias, as it is applied all the time. At least for MLPs having the bias term helps training substantially!
Pingback: Update for nnfor: reuse and retrain models – Forecasting
Vatul August 19, 2018

Hi Professor,

Do we have a functionality to tell the model to not give negative forecasts? My business case is such that only +ve forecasts make sense but I see a lot -ve forecasts in my model as well. Is there a way to tell the model explicitly to not forecast negatively?

Reply ↓
1. Nikos Post authorSeptember 4, 2018
  
  Not currently, in the sense that the model is not restricted to not model negative forecasts. You can adjust forecasts afterwards to be positive. Another approach would be to translate everything to logarithms, model them and then reverse the values ot the original scale. That would ensure strictly positive modelling, but it would also introduce additional nonlinearities.
  
  Reply ↓
Barb March 7, 2019

Dear Nikos,

Thank you very much for the detailed tutorials. I am trying to do stock price forecasting using mlp, but I cannot seem to make it work well. I tried both using the mlp function from the nnfor package and RSNNS. For nnfor I just used close prices as my input time series, whereas for RSNNS I created an input matrix with close, high, low, open prices and a corresponding output vector with close prices on the consecutive day. In the first case I get pretty bad forecasts, whereas in the second case mlp just outputs value of 1 for every forecast. Would you please give me some guidance on what corrections should I use and which function would be more appropriate for my data?

Reply ↓
1. Nikos Post authorMarch 12, 2019
  
  This is really data dependent, so different time series would require different tweaking of the settings. However I should stress that typically stock prices are considered unforecastable, in the sense that the random walk is difficult to beat. At least in terms of the expected value! Data preprocessing will be quite critical as well.
  
  Reply ↓
Nnadozie Nnoli July 5, 2019

Dear Nikos,
I am new in the use of neural networks in R for forecasting dry spells. I have 43 years of data with nine attributes. I use 70% of my data for training and 30% for test. My results show low accuracy possibly because data seems inadequate and it is very difficult for me now to acquire more data. Can anything be done to enhance or increase the amount of data being used? Thank you very much. Hope to hear from you.

Reply ↓
1. Nikos Post authorJuly 8, 2019
  
  There can be many reasons for forecasts being inaccurate, however I think it would be helpful to consider the quality of a forecast comparatively to other alternatives. I would suggest that you have some basic statistical models to compare against, for instance some exponential smoothing or ARIMA forecasts. I would also include a random walk (naive) and then assess whether the network performs poorly against these. As for the network itself, usually a lot is connected to the selection of the input lags.
  
  Reply ↓
  1. Nnadozie Nnoli July 15, 2019
    
    Dear Nikos,
    Thank you very much for your reply. I am working on what you suggested.
    
    Nnadozie
    
    Reply ↓
Nashaat Anber August 3, 2020

Dear Nikos,
Can help me on how use Fuzzy ANN in forecasting , and which helpful library.
thanks

Reply ↓

Related Posts

53 thoughts on “Forecasting time series with neural networks in R”

Leave a Reply Cancel reply