Author Archives: Nikos

Validation and forecasting accuracy in models of climate change

R. Fildes and N. Kourentzes,  2011,  International Journal of Forecasting, 27: 968-995. http://dx.doi.org/10.1016/j.ijforecast.2011.03.008

Forecasting researchers, with few exceptions, have ignored the current major forecasting controversy: global warming and the role of climate modelling in resolving this challenging topic. In this paper, we take a forecaster’s perspective in reviewing established principles for validating the atmospheric-ocean general circulation models (AOGCMs) used in most climate forecasting, and in particular by the Intergovernmental Panel on Climate Change (IPCC). Such models should reproduce the behaviours characterising key model outputs, such as global and regional temperature changes. We develop various time series models and compare them with forecasts based on one well-established AOGCM from the UK Hadley Centre. Time series models perform strongly, and structural deficiencies in the AOGCM forecasts are identified using encompassing tests. Regional forecasts from various GCMs had even more deficiencies. We conclude that combining standard time series methods with the structure of AOGCMs may result in a higher forecasting accuracy. The methodology described here has implications for improving AOGCMs and for the effectiveness of environmental control policies which are focussed on carbon dioxide emissions alone. Critically, the forecast accuracy in decadal prediction has important consequences for environmental planning,so its improvement through this multiple modelling approach should be a priority.

Download paper.

Modelling functional outliers for high frequency time series forecasting with neural networks: an empirical evaluation for electricity load data

N. Kourentzes,  2011, International Conference on Data Mining, DMIN’2011, Las Vegas, 18-21 July 2011.

This paper discusses and empirically evaluates alternative methodologies in modeling functional outliers for high frequency time series forecasting. In spite of several modeling and forecasting methodologies that have been proposed, there have been limited advancements in monitoring and automatically identifying outlying patterns and even less in modeling those for such times series. This is a significant gap considering the difficulty and the cost associated with manual exploration and treatment of such data, due to the vast number of observations. This study proposes and assesses the performance of different modeling methodologies focusing on two key aspects, the accuracy that the outliers are modeled and the impact of each methodology on modeling normal observations. The evaluated methodologies model functional outliers using binary, integer or trigonometric dummy variables, outlier profiles or isolate them into new time series and forecast them separately. Neural networks are employed to produce the forecasts, taking advantage of their flexible nature to accommodate the different methodologies and their superior performance in high frequency time series forecasting. Hourly electricity load data from the UK are used to empirically evaluate the performance of the different methodologies.

Download paper.

Segmenting electrical load time series for forecasting? An empirical evaluation of daily UK load patterns

S. F. Crone and N. Kourentzes,  2011, International Joint Conference on Neural Networks, San Jose, 31-05 August 2011.

Forecasting future electricity load represents one of the most prominent areas of electrical engineering, in which artificial neural networks (NN) are routinely applied in practice. A common approach to overcome the complexity of building NNs for high-frequency load data is to segment the time series into homogeneous subclasses of simpler subseries, often a constant hour of the day or day of the week, which are forecasted independently using a separate NN model, and which are recombined to provide a complete forecast of the next days ahead. Despite the empirical importance of load forecasting, and the high operational cost associated with forecast errors, the potential benefits of segmenting time series into subseries have not been evaluated in an empirical comparison. This paper assesses the empirical accuracy of segmenting empirical hourly load data taken from the UK into daily subseries versus forecasting the original, continuous time series with NNs. Empirical accuracy is provided in comparison to statistical benchmark algorithms and across multiple rolling time origins, which indicates the superior performance of NN on continuous, non-segmented time series, in contrast to best practices.

Download paper.
Download presentation.

Semi-supervised monitoring of electric load time series for unusual patterns

N. Kourentzes and S. F. Crone,  2011, International Joint Conference on Neural Networks,  San Jose, 31-05 August 2011.

In this paper we propose a semi-supervised neural network algorithm to identify unusual load patterns in hourly electricity demand time series. In spite of several modeling and forecasting methodologies that have been proposed, there have been limited advancements in monitoring and automatically identifying outlying patterns in such series. This becomes more important considering the difficulty and the cost associated with manual exploration of such data, due to the vast number of observations. The proposed network learns from both labeled and unlabeled patterns, adapting automatically as more data become available. This drastically limits the cost and effort associated with exploring and labeling such data. We compare the proposed method with conventional supervised and unsupervised approaches, demonstrating higher accuracy, robustness and efficacy on empirical electricity load data.

Download paper.
Download presentation.

Feature selection for time series prediction – A combined filter and wrapper approach for neural networks

S. F. Crone and N. Kourentzes, 2010, Neurocomputing, 73: 1923-1936. http://dx.doi.org/10.1016/j.neucom.2010.01.017

Modelling artificial neural networks for accurate time series prediction poses multiple challenges, in particular specifying the network architecture in accordance with the underlying structure of the time series. The data generating processes may exhibit a variety of stochastic or deterministic time series patterns of single or multiple seasonality, trends and cycles, overlaid with pulses, level shifts and structural breaks, all depending on the discrete time frequency in which it is observed. For heterogeneous datasets of time series, such as the 2008 ESTSP competition, a universal methodology is required for automatic network specification across varying data patterns and time frequencies. We propose a fully data driven forecasting methodology that combines filter and wrapper approaches for feature selection, including automatic feature evaluation, construction and transformation. The methodology identifies time series patterns, creates and transforms explanatory variables and specifies multilayer perceptrons for heterogeneous sets of time series without expert intervention. Examples of the valid and reliable performance in comparison to established benchmark methods are shown for a set of synthetic time series and for the ESTSP’08 competition dataset, where the proposed methodology obtained second place.

Download paper.

A neural network methodology for forecasting constant and dynamic demand rate for intermittent demand time series

N. Kourentzes and S. F. Crone, 2010, The 30th Annual international Symposium on Forecasting, San Diego.

Intermittent demand appears when there are several periods in a time series with no demand occurs and when it occurs it does not have a constant size. Furthermore, intermittent demand time series have typically few observations. These factors make intermittent demand forecasting challenging and forecast errors can be costly in terms of unmet demand or obsolescent stock. Intermittent demand forecasting problems have been addressed using established forecasting methods, like simple moving averages, exponential smoothing and Croston’s method with its variants. This study proposes a neural network (NN) methodology to forecast intermittent time series. NNs are used to provide both constant demand rate forecasts, as the Croston’s method that is the norm for intermittent demand problems, and dynamic demand rate forecasts, which do not assume that the demand rate stays constant in the future. A key NN limitation that is addressed in this study is the small time series sample size, which can hinder NNs’ training.

The methods are compared on a dataset of 3000 real time series, from the automotive industry, using the mean absolute scaled error that has been found appropriate for intermittent demand forecasting evaluations. The out-of-sample comparisons indicate that NNs forecasting constant demand rate have superior performance in comparison to established competing methodologies, while dynamic demand NN forecasts also rank high, indicating that the implications of this alternative should be considered. In order to explore this further, an inventory simulation is performed. The methods are evaluated directly on service level and not using forecast error measures. The findings from both evaluations are contrasted providing insights on the performance of the methods and discussing whether forecast errors are a good proxy for service levels.

Download presentation.

Inference for Neural Network Predictive Models with Impulse Interventions

N. Kourentzes and S. F. Crone, 2010, Proceedings of the 2010 International Conference on Data Mining, DMIN’10, Las Vegas, USA, CSREA.

Neural Networks (NN) have demonstrated remarkable time series fitting and prediction abilities, outperforming in several applications other methods and particularly linear models, such as dynamic linear regression. However, due to their nature, NNs are not easy to interpret and are often considered as black box models. The importance of each independent variable is hard to estimate and therefore test whether they have significant explanatory power and hence be included in the model or not. This task is very important for several applications, where the effect of each variable has to be identified, such as marketing modelling and analysis, where the effectiveness of different marketing instruments has to be estimated, commonly modelled as impulse interventions. Statistical inference in these cases is sought, hindering the use of NNs. This paper proposes a framework to allow statistical inference of impulse interventions modelled with NNs. The effects of interventions are estimated and tested for statistical significance. Using a Monte Carlo simulation the power of the proposed test is compared with dynamic linear regression models. The power is found to be higher and the estimation of the simulated effects is more accurate. Based on this framework strategies to code multiple impulses with NNs are discussed.

Download paper.

 

Frequency independent automatic input variable selection for neural networks for forecasting

N. Kourentzes and S. F. Crone, 2010, International Joint Conference on Neural Networks, Barcelona Spain, 18-23 July 2010.

Key issue in time series forecasting with Neural Networks (NN) is the selection of the relevant input variables, which is often the result of data exploration by human experts, leading to dataset specific solutions and limiting forecasting automation. This becomes even more important in heterogeneous datasets, where each time series requires special modeling and can exhibit a different variety of stochastic and deterministic components of different unknown frequencies. Fully automated forecasting with NNs requires a methodology that can address these issues in an entirely data driven approach. This paper proposes a fully automated input selection methodology based on a novel iterative NN filter that automatically identifies for each time series the seasonal frequencies, if such are present, the dynamic structure of the time series, distinguishing between stochastic and deterministic components, ultimately producing a parsimonious set of input variables. The robustness and performance of the algorithm are evaluated against established time series forecasting methods.

Download paper.
Download presentation.

Evaluation of input variable selection methodologies for multilayer perceptrons for high frequency time series

N. Kourentzes and S. F. Crone, 2009, The 29th Annual international Symposium on Forecasting, Hong Kong.

Neural networks (NN) have been successfully applied in several time series forecasting applications. Past forecasting competitions, like the NN3, NN5 and the MH competitions, have shown that as the data frequency increases, the relative accuracy of NN against benchmarks increases too, providing evidence of promising NN performance on high frequency forecasting problems. However, most of the published modelling methodologies for NN have been developed for low frequency data, like monthly time series. Literature suggests that the modelling tools of low frequency data do not readily apply for high frequency problems, which exhibit different properties of multiple overlying seasonalities, large amount of data, persisting outliers, etc. Therefore, a number of modelling challenges arise as the time series frequency increases. Furthermore, the selection of the input variables for NN, which is the most important determinant of NN accuracy, is usually based on tools developed for low frequency problems, like the ACF and PACF analysis, which become problematical as the frequency increases. This leaves an open question on how to model the input vector of NN for high frequency data and whether the methodologies that have been developed in the past are still applicable.

This analysis evaluates how several ACF and PACF, regression and heuristic based approaches, which are widely used to model NN, perform when applied to high frequency data, discusses the challenges that arise in modelling high frequency data and provides evidence which of these methodologies are still useful for high frequency problems. A large set of daily time series is used to evaluate the competing input variable selection methodologies, using the established standards of valid empirical evaluation, i.e. using a homogeneous set of time series, rolling origin evaluation, robust error measures and statistical tests to determine how these methodologies compare to each other and against a set of established benchmarks.

Download presentation.

Input-variable specification for neural networks – an analysis of forecasting low and high time series frequency

S. F. Crone and N. Kourentzes, 2009, IJCNN’09, Atlanta, USA, IEEE: New York, pp. 3221-3228.

Prior research in forecasting time series with Neural Networks (NN) has provided inconsistent evidence on their predictive accuracy. In management, NN have shown only inferior performance on well established benchmark time series of monthly, quarterly or annual frequency. In contrast, NN have shown preeminent accuracy in electrical load forecasting on daily or hourly time series, leading to successful real life applications. While this inconsistency has been traditionally attributed to the lack of a reliable methodology to model NNs, recent research indicates that the particular data properties of high frequency time series may be equally important. High frequency time series of daily, hourly or even shorter time intervals pose additional modelling challenges in the length and structure of the time series, which may abet the use of novel methods. This analysis aims to identify and contrast the challenges in modelling NN for low and high frequency data in order to develop a unifying forecasting methodology tailored to the properties of the dataset. We conduct a set of experiments in three different frequency domains of daily, weekly and monthly data of one empirical time series of cash machine withdrawals, using a consistent modelling procedure. While our analysis provides evidence that NN are suitable to predict high frequency data, it also identifies a set of challenges in modelling NN that arise from high frequency data, in particular in specifying the input vector, and that require specific modelling approaches applicable to both low and high frequency data.

Download paper.
Download presentation.