First we load the package and some data.
y <- cbind(c(1030.829,893.551,1084.09,1278.436,936.708,915.322,885.713,2364.399,774.88,977.506,831.616,813.656,1569.956,967.925,806.146,1063.117,1123.787,906.686,996.498,1088.464,977.414,1128.328,896.594,1007.172,1046.379,1514.648,1626.115,2959.558,838.506,949.377,1433.307,805.048,1218.907,872.43,1730.103,865.734,1845.713,919.291,1003.363,1102.969,847.38,1965.26,809.673,953.193,1066.089,991.352,1115.694,1003.333,1090.48,930.749,1006.184,1239.068,873.707,728.583,881.316,1302.468,997.442,3481.118,841.042,997.601,1830.194,909.693,2358.406,2573.673,777.08,773.781,945.424,968.646,1074.589,1046.22,1155.559,990.627,931.943,786.285,2297.025,628.166,889.238,937.631,1113.925,870.384,1018.375,799.458,1542.328,1879.587,750.307,1087.948,1247.803,1052.352,883.899,793.126,913.736,1082.142,968.823,2099.176,841.224,964.227,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) x <- cbind(c(0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), c(0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0), c(0,0,0,1,1,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,0,0,1,1,1,0,0,1,1,0,0)) y <- y[!is.na(y)] y <- ts(y,frequency=12)
This toy set includes sales of a heavily promoted item, with three different promotion. Let’s visualise what we have:
plot(y) cmp <- brewer.pal(3,"Set1") for (i in 1:3){ points(time(y)[which(x[,i]==1)],y[which(x[,i]==1)],col=cmp[i],pch=i,cex=1.5) } legend("topright",c("Promo1","Promo2","Promo3"),col=cmp,pch=1:3)
Now let’s fit a MAPA and MAPAx and produce forecasts. I am modelling the sales in logs to capture the multiplicative promotional effects. For other types of applications this is not needed.
mapafit.x <- mapaest(log(y),type="es",display=1,outplot=1,xreg=x) mapafit <- mapaest(log(y),type="es",outplot=1) frc.x <- mapafor(log(y),mapafit.x,ifh=13,fh=13,xreg=x,conf.lvl=c(0.8,0.9,0.95),comb="w.mean") frc <- mapafor(log(y),mapafit,ifh=13,fh=13,conf.lvl=c(0.8,0.9,0.95),comb="w.mean")
Let’s plot the results
par(mfrow=c(1,2)) plot(1:96, y, type="l",xlim=c(1,109),xaxs="i",main="MAPA",xlab="Period",ylab="Sales") for (i in 1:96){ lines(i:(12+i),exp(frc$infor[,i]),col="red") } cmp <- brewer.pal(9,"Reds")[4:2] for (i in 1:3){ polygon(c(97:109,109:97),exp(c(frc$PI[i,],rev(frc$PI[7-i,]))),col=cmp[i],border=NA) } lines(97:109,exp(frc$outfor),col="red") plot(1:96, y, type="l",xlim=c(1,109),xaxs="i",main="MAPAx",xlab="Period",ylab="Sales") for (i in 1:96){ lines(i:(12+i),exp(frc.x$infor[,i]),col="red") } cmp <- brewer.pal(9,"Reds")[4:2] for (i in 1:3){ polygon(c(97:109,109:97),exp(c(frc.x$PI[i,],rev(frc.x$PI[7-i,]))),col=cmp[i],border=NA) } lines(97:109,exp(frc.x$outfor),col="red")
As you can see, MAPA provides a flat forecast. This is “correct” in the sense that apart from promotions this time series contains like information that could be captured by exponential smoothing. On the other hand, MAPAx is provided with the promotional information and makes use of it.
]]>I will be putting updates and fixes there, before they are pushed on CRAN. You can also report there bugs.
You can install the current github version with:
if (!require("devtools")){install.packages("devtools")} devtools::install_github("trnnick/nnfor")]]>
During his PhD he published two papers, with more currently under review:
Looking forward to seeing his future work!
]]>This builds on the neuralnet package for R, and provides the code to make the networks capable of handling time series data automatically. Although that package is quite flexible, it is computationally expensive and does not permit for deep learning. The plan is to eventually implement such capabilities in the package.
There are numerous papers that support the ideas used to put together this package:
The neural network functions in TStools will be removed, initially pointing towards this package and latter removed completely.
There is a github repository for this, where I will be posting updates and fixes till they go on CRAN: https://github.com/trnnick/nnfor
Happy (nonlinear) forecasting!
]]>Ord, K., Fildes, R. and Kourentzes, N., 2017. Principles of business forecasting. 2nd ed. Wessex Press Publishing Co.
I was invited by Keith Ord and Robert Fildes to join them in writing the much-revised 2nd edition of the book. The book is aimed at both practitioners and students and it differs from typical time series textbooks in being focused on business forecasting, with appropriate focus on various methods, as well as processes and judgemental forecasting.
The book’s chapters give you an idea of the various topics covered:
The content of some chapters is self-evident, though others cover a broad set of topics. For example, chapter 10 looks at logistic regression and neural networks, amongst other topics, while chapters 12 and 13 provide a lot of the business context that is missing in many of the available time series textbooks. The book is supported by online material, including R based exercises and examples. You can find more information about the book here, or head to the publisher’s website.
One forecast I surely got wrong is how much work is involved in writing a book! Nonetheless, it has been a fantastic experience to co-author this book with Keith and Robert! I hope you will find it equally interesting and rewarding to read it.
]]>The talk has three parts:
You can download the talk here.
Abstract:
Forecasts are central to decision making. Over the last decades there have been substantial innovations in business forecasting, resulting in increased accuracy of forecasts. Models and modelling principles have matured to address company problems in a realistic sense, i.e. they are aware of the requirements and limitations of practice; and tested empirically to demonstrate their effectiveness. Furthermore, there has been a shift in recognising the importance of having models instead of methods to facilitate parameterisation, model selection and the generation of prediction intervals. The latter has been instrumental in refocusing from point forecasts to prediction intervals, which reflect the relevant risk for the decisions supported by the forecasts. At the same time the quality and quantity of potential model inputs has increased exponentially, permitting models to use more information sources and support higher frequency of decision making, such as daily and weekly planning cycles. All these have facilitated and made necessary an increase in automation of the forecasting process, bringing to the forefront a new dimension of uncertainty: the model selection and specification uncertainty. The uncertainty captured in the prediction intervals assumes that the selected model is `true’. This is hardly the case in practice and we should account for that additional uncertainty. First, we discuss the uncertainties implied in model selection and specification. Then we proceed to develop a way to measure this uncertainty and derive a new way to perform model selection. We demonstrate that that this not only leads to superior selection, but also provides a natural link to model combination and specifying the relevant pool of models. Last, we demonstrate that once we recognise the uncertainty in model specification, we can extract more information from our data by using the multiple temporal aggregation frameworks, and empirically show the achieved increase in forecast accuracy and reliability.
]]>Sales data often only represents a part of the demand for a service product owing to constraints such as capacity or booking limits. Unconstraining methods are concerned with estimating the true demand from such constrained sales data. This paper addresses the frequently encountered situation of observing only a few sales events at the individual product level and proposes variants of small demand forecasting methods to be used for unconstraining. The usual procedure is to aggregate data; however, in that case we lose information on when restrictions were imposed or lifted within a given booking profile. Our proposed methods exploit this information and are able to approximate convex, concave or homogeneous booking curves. Furthermore, they are numerically robust due to our proposed group-based parameter optimization. Empirical results on accuracy and revenue performance based on data from a major car rental company indicate revenue improvements over a best practice benchmark by statistically significant 0.5%-1.4% in typical scenarios.
Download paper.
]]>I should start by saying that the development team of Prophet suggests that its strengths are:
The M3 dataset has multiple series of micro/business interest and as a recent presentation by E. Spiliotis et al. at ISF2017 (slides 11-12) indicated, the characteristics of the time series overlap with typical business time series, albeit not high frequency. However, a lot of business forecasting is still not hourly or daily, so not including high frequency examples for many business forecasters is not necessarily an issue when benchmarking Prophet.
The setup of the experiment is:
Set | No. of series | Horizon | Test set |
---|---|---|---|
Yearly | 645 | 4 | 8 |
Quarterly | 756 | 4 | 8 |
Monthly | 1428 | 12 | 18 |
Other | 174 | 12 | 18 |
I used a number of benchmarks from some existing packages in R, namely:
The idea here is to give Prophet a hard time, but also avoid using too exotic forecasting methods.
I provide the mean and median MASE across all forecast origins and series for each subset in tables 2 and 3 respectively. In brackets I provide the percentage difference from the ETS’ accuracy. In boldface I have highlight the best forecast for each M3 subset. Prophet results are in blue. I provide two MAPA results, the first uses the default options, whereas the second uses comb=”w.mean” that is more mindful of seasonality. For THieF I only provide the default result (using ETS), as in principle it could be applied to any forecast on the table.
Set | ETS | ARIMA | ES (smooth) | SSARIMA (smooth) | MAPA | MAPA (w.mean) | THieF (ETS) | Prophet |
---|---|---|---|---|---|---|---|---|
Yearly | 0.732 (0.00%) | 0.746 (-1.91%) | 0.777 (-6.15%) | 0.783 (-6.97%) | 0.732 (0.00%) | 0.732 (0.00%) | 0.732 (0.00%) | 0.954 (-30.33%) |
Quarterly | 0.383 (0.00%) | 0.389 (-1.57%) | 0.385 (-0.52%) | 0.412 (-7.57%) | 0.386 (-0.78%) | 0.384 (-0.26%) | 0.400 (-4.44%) | 0.553 (-44.39%) |
Monthly | 0.464 (0.00%) | 0.472 (-1.72%) | 0.465 (-0.22%) | 0.490 (-5.60%) | 0.459 (+1.08%) | 0.458 (+1.29%) | 0.462 (+0.43%) | 0.586 (-26.29%) |
Other | 0.447 (0.00%) | 0.460 (-2.91%) | 0.446 (+0.22%) | 0.457 (-2.24%) | 0.444 (+0.67%) | 0.444 (+0.67%) | 0.447 (0.00%) | 0.554 (-23.94%) |
Set | ETS | ARIMA | ES (smooth) | SSARIMA (smooth) | MAPA | MAPA (w.mean) | THieF (ETS) | Prophet |
---|---|---|---|---|---|---|---|---|
Yearly | 0.514 (0.00%) | 0.519 (-0.97%) | 0.511 (+0.58%) | 0.524 (-1.95%) | 0.520 (-1.17%) | 0.520 (-1.17%) | 0.514 (0.00%) | 0.710 (-38.13%) |
Quarterly | 0.269 (0.00%) | 0.266 (+1.12%) | 0.256 (+4.83%) | 0.278 (-3.35%) | 0.254 (+5.58%) | 0.254 (+5.58%) | 0.262 (+2.60%) | 0.388 (-44.24%) |
Monthly | 0.353 (0.00%) | 0.348 (+1.42%) | 0.351 (+0.57%) | 0.373 (-5.67%) | 0.352 (+0.28%) | 0.351 (+0.57%) | 0.351 (+0.57%) | 0.473 (-33.99%) |
Other | 0.275 (0.00%) | 0.269 (+2.18%) | 0.270 (+1.82%) | 0.268 (+2.55%) | 0.283 (-2.91%) | 0.283 (-2.91%) | 0.275 (0.00%) | 0.320 (-16.36%) |
Some comments about the results:
In all fairness, more testing is needed on high frequency data with multiple seasonalities before one should conclude about the performance of Prophet. Nonetheless. for the vast majority of business forecasting needs (such as supply chain forecasting), Prophet does not seem to perform that well. As a final note, this is an open source project, so I am expecting over time to see interesting improvements.
Finally, I want to thank Oliver Schaer for providing me with Prophet R code examples! You can also find some examples here.
]]>In the previous post we saw how the Multiple Aggregation Prediction Algortihm (MAPA) implements the ideas of MTA. We also saw that it has some limitations, particularly requiring splitting forecasts into subcomponents (level, trend and seasonality). Although some forecasting methods provide such outputs naturally, for example Exponential Smoothing and Theta, others do not. More crucially, manually adjusted forecasts do not either, and even though it is possible to use MAPAx for that, a simpler approach would be welcome. This is where Temporal Hierarchies become quite useful, which is an alternative way to implement MTA.
Temporal Hierarchies borrow many ideas from cross-section hierarchies and organise the different temporal aggregation levels as a hierarchy. Consider for example four quarterly observations. The first two quarters constitute the first half-year, and the last two quarters constitute the second half-year. The two half-years add up to make a complete year. These connections imply a hierarchy, much like sales of different packet sizes of a product in a supermarket can be organised in a product hierarchy. However, temporal hierarchies have one key advantage over cross-sectional ones, they are uniquely specified by the problem at hand. Suppose I am given monthly data to forecast. There is a single hierarchy across temporal aggregation levels, much like in the quarterly example before, that I need to deal with, irrespective of the item I need to forecast, the way I got the forecast or the properties of the time series. Once this unique hierarchy is defined (and all the data are coming from temporally aggregate views of the original time series), then all that is left is to do is to forecast across the hierarchy, i.e., all temporal aggregation levels and reconcile the forecasts. The act of reconciliation brings together information from all modelling levels, with the MTA benefits discussed in the previous posts.
Some hierarchies are more complex than others. The quarterly hierarhcy, from the example above, is a very simple three level hierarchy (quarters, half-years, years). A monthly hierarchy is more complex, because there are more than one ways to reach to yearly data from monthly. For example, one could aggregate by 2 months, then these by 2 (4-monthly level), and then that by 3 (yearly level). Alternatively, one could aggregate to quarterly data, half-yearly and then yearly. The two aggregation paths can happen in parallel. The temporal hierarchy is made up by all possible paths. Note that in constrast to MAPA, levels that do not fully add up to a yearly time series are excluded (intuitively they do not belong in any path from the bottom dissagregate level to the top yearly level). This has the advantage that any forecasting model/method does not need to deal with series that may have fractional seasonality. Nonetheless, this is an interesting future research avenue.
The following interactive plot provides the temporal hierarchies for common types of time series. Observe that many have multiple pathways to the top yearly level (for example, monthly time series), and some are very simple hierarchies (for example, days in week). Use the highlight option to easily visualise the various pathways. Once visualised, the analogies with cross-sectional hierarchies are apparent.
To forecast we need to populate every level of the hierarchy with a forecast. So for example, for the quarterly hierarchy we need to provide 3 sets of forecasts, one for the quarterly time series, one for the semi-yearly and one for the yearly. Imagine that each hierarchy depicts one year’s worth of forecasts, but obviously we can produce the same hierarchy for the next year and so on. Mathematically this is just another column of forecasts to be handled by the hierarchy, so in fact it is trivial to do. But an implication is that forecasts are produced in horizons that are multiples of full years (and then any shorter horizons are used accordingly). People are more familiar with two specific cases of temporal hierarchies. One is when we need to produce a total figure over a period, for example for tactical/strategic forecasts. This is simply the bottom-up interpretation of temporal hierarchies: forecasts from the lowest level are summed to a higher level. The other alternative is to produce a forecast and then use a `profile’ to split this further. In supply chain forecasting and call centres this is very common, in breaking weekly forecasts into daily profiles, or daily forecasts into intra-daily profiles. This is merely the top-down interpretation of temporal hierarchies.
Forecasting with Temporal Hierarchies
You may have already noticed that there is nothing to restrict the source of forecasts. They can be based on some statistical model, judgement, mix of both, differ amongst levels, or whatever other exotic source. This is a substantial advantage over MAPA, and temporal hierarchies provide a flexible MTA foundation. In reconciling the forecast there are couple of complications that we deal with in this paper (the scale and variance of the forecasts are different, which needs to be taken into account during reconciliation). I mentioned earlier that temporal hierarchies are unique. This simplifies substantially the solution, but I will not go into the mathematical details here.
In the following interactive plot you can choose from the usual time series I have been using as examples in this series of posts to produce base (conventional built forecast from a single level, in red) and Temporal Hierarchy Forecasts (THieF, in blue). I provide the forecasts across the various temporal aggregation levels permitted by the hierarchy. Observe how the information across the temporal aggregation levels is shared in the THieF forecasts to achieve better modelling of the series. You can also choose between three different forecasts: exponential smoothing, ARIMA and naive. The naive forecasts are quite illuminating in showing how the multiple views offered by THieF achieve supperior results. There other two types of forecasts are quite illustrative as well.
I also provide Mean Absolute Error (MAE) for the base and THeiF forecasts for the dissagregate series. You will observe that on average THieF forecasts are more accurate. The gains improve at more aggregate levels. In the paper we demonstrate with simulations that in various scenarios of uncertainty (parameter, model) THieF performs better or at least as good as base forecasts.
To sum up, forecasting with temporal hierarchies:
If you want to try it out we have released the thief package for R.
A final note on THieF. THieF and MAPA both perform very well and neither is a clear winner in terms of forecast accuracy alone. The two MTA alternatives handle information in a different way. MAPA also takes advantage of the `in-between’ levels that THieF excludes. The good performance of both, even though they have some key differences, is exciting: it gives further merit to MTA and offers some clear directions for future work!
Multiple Temporal Aggregation: the story so far: Part I; Part II; Part III; Part IV.
]]>Abstract
The four major Scandinavian economies (Denmark, Finland, Sweden and Norway) have high workforce mobility and depending on market dynamics the unemployment in one country can be influenced by conditions in the neighbouring ones. We provide evidence that Vector Autoregressive modelling of unemployment between the four countries produces more accurate predictions than constructing independent forecasting models. However, given the dimensionality of the VAR model its specification and estimation can become challenging, particularly when modelling unemployment across multiple factors. To overcome this we consider the hierarchical structure of unemployment in Scandinavia, looking at three dimensions: age, country and gender. This allows us to construct multiple complimentary hierarchies, aggregating across each dimension. The resulting grouped hierarchy enforces a well-defined structure to the forecasting problem. By producing forecasts across the hierarchy, under the restriction that they are reconciled across the hierarchical structure, we provide an alternative way to establish connections between the time series that describe the four countries. We demonstrate that this approach is not only competitive with VAR modelling, but as each series is modelled independently, we can easily employ advanced forecasting models, in which case independent and VAR forecasts are substantially outperformed. Our results illustrate that there are three useful alternatives to model connections between series, directly through multivariate vector models, through the covariance of the prediction errors across a hierarchy of series, and through the implicit restrictions enforced by the hierarchical structure. We provide evidence of the performance of each, as well as their combination.
]]>