I have been looking for a package to do time series modelling in R with neural networks for quite some time with limited success. The only implementation I am aware of that takes care of autoregressive lags in a user-friendly way is the `nnetar`

function in the `forecast`

package, written by Rob Hyndman. In my view there is space for a more flexible implementation, so I decided to write a few functions for that purpose. For now these are included in the TStools package that is available in GitHub, but when I am happy with their performance and flexibility I will put them in a package of their own.

Here I will provide a quick overview of what these is available right now. I plan to write a more detailed post about these functions when I get the time.

For this example I will model the `AirPassengers`

time series available in R. I have kept the last 24 observations as a test set and will use the rest to fit the neural networks. Currently there are two types of neural network available, both feed-forward: (i) multilayer perceptrons (use function `mlp`

); and extreme learning machines (use function `elm`

).

```
# Fit MLP
mlp.fit <- mlp(y.in)
plot(mlp.fit)
print(mlp.fit)
```

This is the basic command to fit an MLP network to a time series. This will attempt to automatically specify autoregressive inputs and any necessary pre-processing of the time series. With the pre-specified arguments it trains 20 networks which are used to produce an ensemble forecast and a single hidden layer with 5 nodes. You can override any of these settings. The output of `print`

is a summary of the fitted network:

MLP fit with 5 hidden nodes and 20 repetitions. Series modelled in differences: D1. Univariate lags: (1,3,4,6,7,8,9,10,12) Deterministic seasonal dummies included. Forecast combined using the median operator. MSE: 6.2011.

As you can see the function determined that level differences are needed to capture the trend. It also selected some autoregressive lags and decided to also use dummy variables for the seasonality. Using `plot`

displays the architecture of the network (Fig. 1).

The light red inputs represent the binary dummies used to code seasonality, while the grey ones are autoregressive lags. To produce forecasts you can type:

mlp.frc <- forecast(mlp.fit,h=tst.n) plot(mlp.frc)

Fig. 2 shows the ensemble forecast, together with the forecasts of the individual neural networks. You can control the way that forecasts are combined (I recommend using the median or mode operators), as well as the size of the ensemble.

You can also let it choose the number of hidden nodes. There are various options for that, but all are computationally expensive (I plan to move the base code to CUDA at some point, so that computational cost stops being an issue).

```
# Fit MLP with automatic hidden layer specification
mlp2.fit <- mlp(y.in,hd.auto.type="valid",hd.max=10)
print(round(mlp2.fit$MSEH,4))
```

This will evaluate from 1 up to 10 hidden nodes and pick the best on validation set MSE. You can also use cross-validation (if you have patience…). You can ask it to output the errors for each size:

MSE H.1 0.0083 H.2 0.0066 H.3 0.0065 H.4 0.0066 H.5 0.0071 H.6 0.0074 H.7 0.0061 H.8 0.0076 H.9 0.0083 H.10 0.0076

There are a few experimental options in specifying various aspects of the neural networks, which are not fully documented and is probably best if you stay away from them for now!

ELMs work pretty much in the same way, although for these I have made default the automatic specification of the hidden layer.

```
# Fit ELM
elm.fit <- elm(y.in)
print(elm.fit)
plot(elm.fit)
```

This gives the following network summary:

ELM fit with 100 hidden nodes and 20 repetitions. Series modelled in differences: D1. Univariate lags: (1,3,4,6,7,8,9,10,12) Deterministic seasonal dummies included. Forecast combined using the median operator. Output weight estimation using: lasso. MSE: 83.0044.

I appreciate that using 100 hidden nodes on such a short time series can make some people uneasy, but I am using a shrinkage estimator instead of conventional least squares to estimate the weights, which in fact eliminates most of the connections. This is apparent in the network architecture in Fig. 3. Only the nodes connected with the black lines to the output layer contribute to the forecasts. The remaining connection weights have been shrunk to zero.

Another nice thing about these functions is that you can call them from the thief package, which implements Temporal Hierarchies forecasting in R. You can do that in the following way:

```
# Use THieF
library(thief)
mlp.thief <- thief(y.in,h=tst.n,forecastfunction=mlp.thief)
```

There is a similar function for using ELM networks: `elm.thief`

.

Since for this simple example I kept some test set, I benchmark the forecasts against exponential smoothing:

Method | MAE |
---|---|

MLP (5 nodes) | 62.471 |

MLP (auto) | 48.234 |

ELM | 48.253 |

THieF-MLP | 45.906 |

ETS | 64.528 |

Temporal hierarchies, like MAPA, are great for making your forecasts more robust and often more accurate. However, with neural networks the additional computational cost is evident!

These functions are still in development, so the default values may change and there are a few experimental options that may give you good results or not!

Hello Nikos, great post, thank you! Can I ask you some questions please:

1. Do you know, to what extent mlp{TStools} differs from mlp{RSNNS} or they essentially use a similar technique?

2. Does mlp{TStools} need prior scaling of input time series and/or exogenous regressors, or it scales them automatically?

3. Is there any maximum number of exogenous regressors that can be included?

Thank you!

Hello! They are quite different in that mlp{TStools} does all the preprocessing and model setup for time series forecasting automatically. So it takes care of scaling of the target and exogenous variables, differencing, it can do automatic selection of input lags and hidden nodes, unless otherwise specified, introduce seasonal dummy variables (if needed) and so on. The mlp{RSNNS} offers functionality to build and train a neural network, but you would have to do all the preprocessing manually. In some sense the mlp{TStools} is build in the same philosophy as auto.arima, where the user does not have to worry about preprocessing the data or specifying the model details. There is no maximum number of regressors it can accommodate, but I should say that it is not very fast in training large neural networks (especially given that the output is ensemble based by default). For that GPU based neural networks would be idea, which I have not implemented in R yet.

Thank you for reply, Nikos.

I have recently read your article “Segmenting electrical load time series for forecasting? An empirical evaluation of daily UK load patterns”, where you wrote that you had trained your MLPs using Levenberg-Marquardt function and eventually you received quite low MAPEs. Is it something that is implemented within mlp{TStools}? And may I ask what R function did you use at that time for this analysis?

In your research you came to the conclusion that forecasting entire time series is better than forecasting decomposed series. Would you suggest using for mlp{TStools} for forecasting consecutive electricity load with 100,000+ observation, or its computational capacity is unable to capture such a big dataset and it is worth using mlp{TStools} for forecasting decomposed time series?

Thank you!

That analysis was done in MatLab. I do not expect that there will be too much difference due to the specific training algorithms.

100k time series will take a lot of time to train with most neural network implementations in R. mlp in TStools will most surely be slow… go for lunch, coffee and a nice walk while calculating slow. For such massive datasets you need very efficient implementations that make use of your GPU (assuming your graphics card is CUDA capable). Matlab allows that, but there are also options for R, but essential connect with external implementations (for e.g. java).

Unfortunately, neural networks are still not trivial to use and require the computer skills to put together a solution programmatically.

Hello Nikos,

Let’s see if you could give me a hand on this. Im trying to do a prediction algorithm on mechanical failures. I have the data and my doubt comes on how to implement it. Is it a time series? I know I could use survival analysis but that’s just statistics, I want to use ML so I thought to use NN and that’s when i came to your article. What you say?

Thanks in advance ðŸ˜‰

Hi Daniel,

Interesting question; as you suggest typically people model this as a survival analysis problem. What kind of variable do you have available? (and software restrictions?)

Nikos

Hi,

I like the forecast package, but I saw it is limited to Aroma when you want to include exogenous variables. Can you include those here? Neural networks should be ideal for that problem …

Holger

You can, there is an xreg argument to help you do that. Just keep in mind the code is still beta, hence not on CRAN yet!

Hi Nikos,

Can we use TStools or THief package for intermittent data too?

Vivek

Yes! You would use tsintermittent and thief packages and or the thief forecast you would need to use the forecast function argument. For examples of how to do this have a look on the smooth.thief function in TStools.