Benchmarking Facebook’s Prophet

By | July 29, 2017

Last February Facebook open sourced its Prophet forecasting tool. Since, it had appeared in quite a few discussions online. A good thing about Prophet is that it one can use it very easily through R (and Python). This gave me the opportunity to benchmark it against some more standard – and not! – forecasting models and methods. To do this I tried it on the M3 competition dataset (available through the Mcomp package for R).

I should start by saying that the development team of Prophet suggests that its strengths are:

  • high-frequency data (hourly, daily, or weekly) with multiple seasonalities, such as day of week and time of year;
  • special events and bank holidays that are not fixed in the year;
  • in the presence of missing values or large outliers;
  • changes in the historical trends, which themselves are non-linear growth curves.

The M3 dataset has multiple series of micro/business interest and as a recent presentation by E. Spiliotis et al. at ISF2017 (slides 11-12) indicated, the characteristics of the time series overlap with typical business time series, albeit not high frequency. However, a lot of business forecasting is still not hourly or daily, so not including high frequency examples for many business forecasters is not necessarily an issue when benchmarking Prophet.

The setup of the experiment is:

  • Use Mean Absolute Scaled Error (MASE). I chose this measure as it has good statistical properties and has become quite common in forecasting research.
  • Use rolling origin evaluation, so as ensure that the reported figures are robust against particularly lucky (or unlucky) forecast origins and test sets.
  • Use the forecast horizons and test sets indicated in Table 1, for each M3 subset.
Table 1. M3 dataset
Set No. of series Horizon Test set
Yearly 645 4 8
Quarterly 756 4 8
Monthly 1428 12 18
Other 174 12 18

I used a number of benchmarks from some existing packages in R, namely:

  • forecast package, from which I used the exponential smoothing (ets) and ARIMA (auto.arima) functions. Anybody doing forecasting in R is familiar with this package! ETS and ARIMA over the years have been shown to be very strong benchmarks for business forecasting tasks and specifically for the M3 dataset.
  • smooth package. This is a less known package that offers alternative implementations of exponential smoothing (es) and ARIMA (auto.ssarima), which follow a different modelling philoshopy than the forecast package equivalents. If you are interested, head over to Ivan’s blog to read more about these (and other nice blog posts).  forecast and smooth packages used together offer a tremendous flexibiltiy in ETS and ARIMA modelling.
  • MAPA and thief packages, which both implement Multiple Temporal Aggregation (MTA) for forecasting, following to alternative approaches that I detail here (for MAPA) and here (for THieF). I included these as they have been shown to perform quite well on such tasks.

The idea here is to give Prophet a hard time, but also avoid using too exotic forecasting methods.

I provide the mean and median MASE across all forecast origins and series for each subset in tables 2 and 3 respectively. In brackets I provide the percentage difference from the ETS’ accuracy. In boldface I have highlight the best forecast for each M3 subset. Prophet results are in blue. I provide two MAPA results, the first uses the default options, whereas the second uses comb=”w.mean” that is more mindful of seasonality. For THieF I only provide the default result (using ETS), as in principle it could be applied to any forecast on the table.

Table 2. Mean MASE results
Set ETS ARIMA ES (smooth) SSARIMA (smooth) MAPA MAPA (w.mean) THieF (ETS) Prophet
Yearly 0.732 (0.00%) 0.746 (-1.91%) 0.777 (-6.15%) 0.783 (-6.97%) 0.732 (0.00%) 0.732 (0.00%) 0.732 (0.00%) 0.954 (-30.33%)
Quarterly 0.383 (0.00%) 0.389 (-1.57%) 0.385 (-0.52%) 0.412 (-7.57%) 0.386 (-0.78%) 0.384 (-0.26%) 0.400 (-4.44%) 0.553 (-44.39%)
Monthly 0.464 (0.00%) 0.472 (-1.72%) 0.465 (-0.22%) 0.490 (-5.60%) 0.459 (+1.08%) 0.458 (+1.29%) 0.462 (+0.43%) 0.586 (-26.29%)
Other 0.447 (0.00%) 0.460 (-2.91%) 0.446 (+0.22%) 0.457 (-2.24%) 0.444 (+0.67%) 0.444 (+0.67%) 0.447 (0.00%) 0.554 (-23.94%)

Table 3. Median MASE results
Set ETS ARIMA ES (smooth) SSARIMA (smooth) MAPA MAPA (w.mean) THieF (ETS) Prophet
Yearly 0.514 (0.00%) 0.519 (-0.97%) 0.511 (+0.58%) 0.524 (-1.95%) 0.520 (-1.17%) 0.520 (-1.17%) 0.514 (0.00%) 0.710 (-38.13%)
Quarterly 0.269 (0.00%) 0.266 (+1.12%) 0.256 (+4.83%) 0.278 (-3.35%) 0.254 (+5.58%) 0.254 (+5.58%) 0.262 (+2.60%) 0.388 (-44.24%)
Monthly 0.353 (0.00%) 0.348 (+1.42%) 0.351 (+0.57%) 0.373 (-5.67%) 0.352 (+0.28%) 0.351 (+0.57%) 0.351 (+0.57%) 0.473 (-33.99%)
Other 0.275 (0.00%) 0.269 (+2.18%) 0.270 (+1.82%) 0.268 (+2.55%) 0.283 (-2.91%) 0.283 (-2.91%) 0.275 (0.00%) 0.320 (-16.36%)

Some comments about the results:

  • Prophet performs very poorly. The dataset does not contain multiple seasonalities, but it does contain human-activity based seasonal patters (quarterly and monthly series), changing trends and outliers or other abrupt changes (especially the `other’ subset), where Prophet should do ok. My concern is not that it is not ranking first, but that at best it is almost 16% worse than exponential smoothing (and at worst almost 44%!);
  • ETS and ARIMA between packages perform reasonably similar, indicating that although there are implementation differences, both packages have followed sound modelling philoshopies;
  • MAPA and THieF are meant to work on the quarterly and monthly subsets, where, in line with the research, they improve upon their base model (ETS).

In all fairness, more testing is needed on high frequency data with multiple seasonalities before one should conclude about the performance of Prophet. Nonetheless. for the vast majority of business forecasting needs (such as supply chain forecasting), Prophet does not seem to perform that well. As a final note, this is an open source project, so I am expecting over time to see interesting improvements.

Finally, I want to thank Oliver Schaer for providing me with Prophet R code examples! You can also find some examples here.

Leave a Reply

Your email address will not be published. Required fields are marked *