The nnfor<\/strong><\/a> (development version here<\/a>) package for R facilitates time series forecasting with Multilayer Perceptrons (MLP) and Extreme Learning Machines (ELM). Currently (version 0.9.6) it does not support deep learning, though the plan is to extend this to this direction in the near future. Currently, it relies on the neuralnet<\/strong> package for R, which provides all the machinery to train MLPs. The training of ELMs is written within the nnfor<\/strong> package. Note that since neuralnet<\/strong> cannot tap on GPU processing, large networks tend to be very slow to train. nnfor<\/strong> differs from existing neural network implementations for R in that it provides code to automatically design networks with reasonable forecasting performance, but also provide in-depth control to the experienced user. The automatic specification is designed with parsimony in mind. This increases the robustness of the resulting networks, but also helps reduce the training time.<\/div>\n
\n
Forecasting with MLPs<\/h2>\n
With the nnfor<\/strong> package you can either produce extrapolative (univariate) forecast, or include explanatory variables as well.<\/p>\n
\n
Univariate forecasting<\/h3>\n
The main function is mlp()<\/code>, and at its simplest form you only need to input a time series to be modelled.<\/p>\n
library(nnfor)\r\nfit1 <- mlp(AirPassengers)\r\nprint(fit1)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 7.4939.<\/code><\/pre>\nThe output indicates that the resulting network has 5 hidden nodes, it was trained 20 times and the different forecasts were combined using the median operator. The mlp()<\/code> function automatically generates ensembles of networks, the training of which starts with different random initial weights. Furthermore, it provides the inputs that were included in the network. This paper<\/a> discusses the performance of different combination operators and finds that the median performs very well, the mode can achieve the best performance but needs somewhat larger ensembles and the arithmetic mean is probably best avoided! Another interesting finding in that paper is that bagging (i.e.\u00a0training the network on bootstrapped series) or using multiple random training initialisations results in similar performance, and therefore it appears that for time series forecasting we can avoid the bootstrapping step, greatly simplifying the process. These findings are embedded in the default settings of nnfor<\/strong>.<\/p>\n You can get a visual summary by using the plot()<\/code> function.<\/p>\n plot(fit1)<\/code><\/pre>\n<\/a><\/p>\n The grey input nodes are autoregressions, while the magenta ones are deterministic inputs (seasonality in this case). If any other regressors were included, they would be shown in light blue.<\/p>\n The mlp()<\/code> function accepts several arguments to fine-tune the resulting network. The hd<\/code> argument defines a fixed number of hidden nodes. If it is a single number, then the neurons are arranged in a single hidden node. If it is a vector, then these are arranged in multiple layers.<\/p>\n fit2 <- mlp(AirPassengers, hd = c(10,5))\r\nplot(fit2)<\/code><\/pre>\n<\/a><\/p>\n We will see later on how to automatically select the number of nodes. In my experience (and evidence from the literature), conventional neural networks, forecasting single time series, do not benefit from multiple hidden layers. The forecasting problem is typically just not that complex!<\/p>\n The argument reps<\/code> defines how many training repetitions are used. If you want to train a single network you can use reps=1<\/code>, although there is overwhelming evidence that there is no benefit in doing so. The default reps=20<\/code> is a compromise between training speed and performance, but the more repetitions you can afford the better. They help not only in the performance of the model, but also in the stability of the results, when the network is retrained. How the different training repetitions are combined is controlled by the argument comb<\/code> that accepts the options median<\/code>, mean<\/code>, and mode<\/code>. The mean and median are apparent. The mode is calculated using the maximum of a kernel density estimate of the forecasts for each time period. This is detailed in the aforementioned paper<\/a> and exemplified here<\/a>.<\/p>\n The argument lags<\/code> allows you to select the autoregressive lags considered by the network. If this is not provided then the network uses lag 1 to lag m<\/code>, the seasonal period of the series. These are suggested lags and they may not stay in the final networks. You can force that using the argument keep<\/code>, or turn off the automatic input selection altogether using the argument sel.lag=FALSE<\/code>. Observe the differences in the following calls of mlp()<\/code>.<\/p>\n mlp(AirPassengers, lags=1:24)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,4,7,8,9,10,11,12,13,18,21,23,24)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 5.6388.<\/code><\/pre>\nmlp(AirPassengers, lags=1:24, keep=c(rep(TRUE,12), rep(FALSE,12)))<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,9,10,11,12,13,18,21,23,24)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 3.6993.<\/code><\/pre>\nmlp(AirPassengers, lags=1:24, sel.lag=FALSE)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 1.7347.<\/code><\/pre>\nIn the first case lags (1,2,4,7,8,9,10,11,12,13,18,21,23,24) are retained. In the second case all 1-12 are kept and the rest 13-24 are tested for inclusion. Note that the argument keep<\/code> must be a logical with equal length to the input used in lags<\/code>. In the last case, all lags are retained. The selection of the lags heavily relies on this<\/a> and this<\/a> papers, with evidence of its performance on high-frequency time series outlined here<\/a>. An overview is provided in: Ord K., Fildes R., Kourentzes N. (2017) Principles of Business Forecasting 2e. Wessex Press Publishing Co., Chapter 10.<\/p>\n Note that if the selection algorithm decides that nothing should stay in the network, it will include lag 1 always and you will get a warning message: No inputs left in the network after pre-selection, forcing AR(1)<\/code>.<\/p>\n Neural networks are not great in modelling trends. You can find the arguments for this here<\/a>. Therefore it is useful to remove the trend from a time series prior to modelling it. This is handled by the argument difforder<\/code>. If difforder=0<\/code> no differencing is performed. For diff=1<\/code>, level differences are performed. Similarly, if difforder=12<\/code> then 12th order differences are performed. If the time series is seasonal with seasonal period 12, this would then be seasonal differences. You can do both with difforder=c(1,12)<\/code> or any other set of difference orders. If difforder=NULL<\/code> then the code decides automatically. If there is a trend, first differences are used. The series is also tested for seasonality. If there is, then the Canova-Hansen test is used to identify whether this is deterministic or stochastic. If it is the latter, then seasonal differences are added as well.<\/p>\n Deterministic seasonality is better modelled using seasonal dummy variables. by default the inclusion of dummies is tested. This can be controlled by using the logical argument allow.det.season<\/code>. The deterministic seasonality can be either a set of binary dummies, or a pair of sine-cosine (argument det.type<\/code>), as outlined here<\/a>. If the seasonal period is more than 12, then the trigonometric representation is recommended for parsimony.<\/p>\n The logical argument outplot<\/code> provides a plot of the fit of network.<\/p>\n The number of hidden nodes can be either preset, using the argument hd<\/code> or automatically specified, as defined with the argument hd.auto.type<\/code>. By default this is hd.auto.type=\"set\"<\/code> and uses the input provided in hd<\/code> (default is 5). You can set this to hd.auto.type=\"valid\"<\/code> to test using a validation sample (20% of the time series), or hd.auto.type=\"cv\"<\/code> to use 5-fold cross-validation. The number of hidden nodes to evaluate is set by the argument hd.max<\/code>.<\/p>\n fit3 <- mlp(AirPassengers, hd.auto.type=\"valid\",hd.max=8)\r\nprint(fit3)<\/code><\/pre>\n## MLP fit with 4 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 14.2508.<\/code><\/pre>\nGiven that training networks can be a time consuming business, you can reuse an already specified\/trained network. In the following example, we reuse fit1<\/code> to a new time series.<\/p>\nx <- ts(sin(1:120*2*pi\/12),frequency=12)\r\nmlp(x, model=fit1)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 0.0688.<\/code><\/pre>\nThis retains both the specification and the training from fit1<\/code>. If you want to use only the specification, but retrain the network, then use the argument retrain=TRUE<\/code>.<\/p>\nmlp(x, model=fit1, retrain=TRUE)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 0.<\/code><\/pre>\nObserve the difference in the in-sample MSE between the two settings.<\/p>\n Finally, you can pass arguments directly to the neuralnet()<\/code> function that is used to train the networks by using the ellipsis ...<\/code>.<\/p>\n To produce forecasts, we use the function forecast()<\/code>, which requires a trained network object and the forecast horizon h<\/code>.<\/p>\nfrc <- forecast(fit1,h=12)\r\nprint(frc)<\/code><\/pre>\n## Jan Feb Mar Apr May\r\n## 1961 447.3392668 421.2532515 497.5052166 521.1640683 537.4031708\r\n## Jun Jul Aug Sep Oct\r\n## 1961 619.5015908 707.8790407 681.0523280 602.8467629 529.0477736\r\n## Nov Dec\r\n## 1961 470.8292734 517.6762262<\/code><\/pre>\nplot(frc)<\/code><\/pre>\n<\/a><\/p>\n The plot of the forecasts provides in grey the forecasts of all the ensemble members. The output of forecast()<\/code> is of class forecast<\/code> and those familiar with the forecast<\/strong> package will find familiar elements there. To access the point forecasts use frc$mean<\/code>. The frc$all.mean<\/code> contains the forecasts of the individual ensemble members.<\/p>\n<\/div>\n \nUsing regressors<\/h3>\nThere are three arguments in the mlp()<\/code> function that enable to use explanatory variables: xreg<\/code>, xreg.lags<\/code> and xreg.keep<\/code>. The first is used to input additional regressors. These must be organised as an array and be at least as long as the in-sample time series, although it can be longer. I find it helpful to always provide length(y)+h<\/code>. Let us suppose that we want to use a deterministic trend to forecast the time series. First, we construct the input and then model the series.<\/p>\nz <- 1:(length(AirPassengers)+24) # I add 24 extra observations for the forecasts\r\nz <- cbind(z) # Convert it into a column-array\r\nfit4 <- mlp(AirPassengers,xreg=z,xreg.lags=list(0),xreg.keep=list(TRUE),\r\n # Add a lag0 regressor and force it to stay in the model\r\n difforder=0) # Do not let mlp() to remove the stochastic trend\r\nprint(fit4)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Univariate lags: (1,4,5,8,9,10,11,12)\r\n## 1 regressor included.\r\n## - Regressor 1 lags: (0)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 32.4993.<\/code><\/pre>\nThe output reflects the inclusion of the regressor. This is reflected in the plot of the network with a light blue input.<\/p>\nplot(fit4)<\/code><\/pre>\n<\/a><\/p>\n Observe that z<\/code> is organised as an array. If this is a vector you will get an error. To include more lags, we expand the xreg.lags<\/code>:<\/p>\n mlp(AirPassengers,difforder=0,xreg=z,xreg.lags=list(1:12))<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Univariate lags: (1,4,5,8,9,10,11,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 48.8853.<\/code><\/pre>\nObserve that nothing was included in the network. We use the xreg.keep<\/code> to force these in.<\/p>\nmlp(AirPassengers,difforder=0,xreg=z,xreg.lags=list(1:12),xreg.keep=list(c(rep(TRUE,3),rep(FALSE,9))))<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Univariate lags: (1,4,5,8,9,10,11,12)\r\n## 1 regressor included.\r\n## - Regressor 1 lags: (1,2,3)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 32.8439.<\/code><\/pre>\nClearly, the network does not like the deterministic trend! It only retains it, if we force it. Observe that both xreg.lags<\/code> and xreg.keep<\/code> are lists. Where each list element corresponds to a column in xreg<\/code>. As an example, we will encode extreme residuals of fit1<\/code> as a single input (see this paper<\/a> for a discussion on how networks can code multiple binary dummies in a single one). For this I will use the function residout()<\/code> from the tsutils<\/strong> package.<\/p>\n if (!require(\"tsutils\")){install.packages(\"tsutils\")}\r\nlibrary(tsutils)\r\nloc <- residout(AirPassengers - fit1$fitted, outplot=FALSE)$location\r\nzz <- cbind(z, 0)\r\nzz[loc,2] <- 1\r\nfit5 <- mlp(AirPassengers,xreg=zz, xreg.lags=list(c(0:6),0),xreg.keep=list(rep(FALSE,7),TRUE))\r\nprint(fit5)<\/code><\/pre>\n## MLP fit with 5 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## 1 regressor included.\r\n## - Regressor 1 lags: (0)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## MSE: 7.2178.<\/code><\/pre>\nObviously, you can include as many regressors as you want.<\/p>\n To produce forecasts, we use the forecast()<\/code> function, but now use the xreg<\/code> input. The way to make this work is to input the regressors starting from the same observation that was used during the training of the network, expanded as need to cover the forecast horizon. You do not need to eliminate unused regressors. The network will take care of this.<\/p>\nfrc.reg <- forecast(fit5,xreg=zz)<\/code><\/pre>\n<\/div>\n<\/div>\n\nForecasting with ELMs<\/h2>\nTo use Extreme Learning Machines (EMLs) you can use the function elm()<\/code>. Many of the inputs are identical to mlp()<\/code>. By default ELMs start with a very large hidden layer (100 nodes) that is pruned as needed.<\/p>\nfit6 <- elm(AirPassengers)\r\nprint(fit6)<\/code><\/pre>\n## ELM fit with 100 hidden nodes and 20 repetitions.\r\n## Series modelled in differences: D1.\r\n## Univariate lags: (1,2,3,4,5,6,7,8,10,12)\r\n## Deterministic seasonal dummies included.\r\n## Forecast combined using the median operator.\r\n## Output weight estimation using: lasso.\r\n## MSE: 91.288.<\/code><\/pre>\nplot(fit6)<\/code><\/pre>\n<\/a><\/p>\n Observe that the plot of the network has some black and some grey lines. The latter are pruned. There are 20 networks fitted (controlled by the argument reps<\/code>). Each network may have different final connections. You can inspect these by using plot(fit6,1)<\/code>, where the second argument defines which network to plot.<\/p>\n par(mfrow=c(2,2))\r\nfor (i in 1:4){plot(fit6,i)}\r\npar(mfrow=c(1,1))<\/code><\/a><\/pre>\nHow the pruning is done is controlled by the argument.type<\/code> The default option is to use LASSO regression (type=\u201classo\u201d). Alternatively, use can use \u201cridge\u201d for ridge regression, \u201cstep\u201d for stepwise OLS and \u201clm\u201d to get the OLS solution with no pruning.<\/p>\n The other difference from the mlp()<\/code> function is the barebone<\/code> argument. When this is FALSE<\/code>, then the ELMs are built based on the neuralnet<\/strong> package. If this is set to TRUE<\/code> then a different internal implementation is used, which is helpful to speed up calculations when the number of inputs is substantial.<\/p>\n To forecast, use the forecast()<\/code> function in the same way as before.<\/p>\nforecast(fit6,h=12)<\/code><\/pre>\n## Jan Feb Mar Apr May\r\n## 1961 449.0982276 423.0329121 455.7643540 488.5096713 499.3616506\r\n## Jun Jul Aug Sep Oct\r\n## 1961 567.7486203 645.6309202 622.8311280 543.9549163 495.8305278\r\n## Nov Dec\r\n## 1961 429.9258197 468.7921570<\/code><\/pre>\n<\/div>\n\nTemporal hierarchies and nnfor<\/strong><\/h2>\nPeople who have followed my research will be familiar with Temporal Hierarchies<\/a> that are implemented in the package thief<\/strong>. You can use both mlp()<\/code> and elm()<\/code> with thief()<\/code> by using the functions mlp.thief()<\/code> and elm.thief()<\/code>.<\/p>\n if (!require(\"thief\")){install.packages(\"thief\")}\r\nlibrary(thief)\r\nthiefMLP <- thief(AirPassengers,forecastfunction=mlp.thief)\r\n# Similarly elm.thief:\r\nthiefELM <- thief(AirPassengers,forecastfunction=elm.thief)\r\npar(mfrow=c(1,2))\r\nplot(thiefMLP)\r\nplot(thiefELM)\r\npar(mfrow=c(1,1))<\/code><\/pre>\n<\/a><\/p>\n This should get you going with time series forecasting with neural networks.<\/p>\n Happy forecasting!<\/em><\/p>\n<\/div>\n Related Posts<\/H3>Discussion panel on ‘AI in research’ at Sk\u00f6vde<\/a><\/li>\n OR62 -The quest for greater forecasting accuracy: Perspectives from Statistics & Machine Learning<\/a><\/li>\n Forecasting keynote at AMLC 2019<\/a><\/li>\n<\/ul><\/div>","protected":false},"excerpt":{"rendered":"The nnfor (development version here) package for R facilitates time series forecasting with Multilayer Perceptrons (MLP) and Extreme Learning Machines (ELM). Currently (version 0.9.6) it does not support deep learning, though the plan is to extend this to this direction in the near future. Currently, it relies on the neuralnet package for R, which provides\u2026 Read More »<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[41],"tags":[45,12,39],"_links":{"self":[{"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/posts\/1574"}],"collection":[{"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/comments?post=1574"}],"version-history":[{"count":3,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/posts\/1574\/revisions"}],"predecessor-version":[{"id":1584,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/posts\/1574\/revisions\/1584"}],"wp:attachment":[{"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/media?parent=1574"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/categories?post=1574"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kourentzes.com\/forecasting\/wp-json\/wp\/v2\/tags?post=1574"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}

Forecasting with MLPs<\/h2>\nWith the nnfor<\/strong> package you can either produce extrapolative (univariate) forecast, or include explanatory variables as well.<\/p>\n

Univariate forecasting<\/h3>\nThe main function is mlp()<\/code>, and at its simplest form you only need to input a time series to be modelled.<\/p>\n

Forecasting with ELMs<\/h2>\nTo use Extreme Learning Machines (EMLs) you can use the function elm()<\/code>. Many of the inputs are identical to mlp()<\/code>. By default ELMs start with a very large hidden layer (100 nodes) that is pruned as needed.<\/p>\n