Devon Barrow, Nikolaos Kourentzes, Rickard Sandberg, and Jacek Niklewski, 2020. Expert Systems with Applications.

A major challenge in automating the production of a large number of forecasts, as often required in many business applications, is the need for robust and reliable predictions. Increased noise, outliers and structural changes in the series, all too common in practice, can severely affect the quality of forecasting. We investigate ways to increase the reliability of exponential smoothing forecasts, the most widely used family of forecasting models in business forecasting. We consider two alternative sets of approaches, one stemming from statistics and one from machine learning. To this end, we adapt M-estimators, boosting and inverse boosting to parameter estimation for exponential smoothing. We propose appropriate modifications that are necessary for time series forecasting while aiming to obtain scalable algorithms. We evaluate the various estimation methods using multiple real datasets and find that several approaches outperform the widely used maximum likelihood estimation. The novelty of this work lies in (1) demonstrating the usefulness of M-estimators, (2) and of inverse boosting, which outperforms standard boosting approaches, and (3) a comparative look at statistics versus machine learning inspired approaches.

Download paper.

Really great research! I’m going to have to homebrew implement this in Python. There are some spelling errors in section 4.1.3 ‘psedo’ instead of pseudo.

Really curious how else pseudo-Huber could be used…

Thanks! There is a lot of work in using M-estimators in non-predictive modelling questions, but I have not worked on that myself enough. I think they are being increasingly picked up by the ML literature as well, which is great as the estimation problems there are more severe than in statistical models.

Just an idea but Have you tried using the AICc ETS weighting scheme but using pseudo-Huber for the loss function?

I suppose you could look at it two ways, calc./optimize to log-like/AICc to produce weights then re-optimize the model parameters with pseudo-Huber, or optimize with pseudo-Huber initially then calc log-lik/AICc after the fact for weighting. Is that right?

The challenge is that in principle AICc derived weights would only be appropriate when you use maximum likelihood to get the parameters. Having said that, practically speaking I would expect a combination to be beneficial, even if sub-optimal, or violate some of its assumptions. If you would like to avoid this, you can always use cross-validation based errors (have a look here). I have not had the chance to experiment with this yet though.