MAPA package for R on GitHub

By | October 24, 2016

Here is the link: https://github.com/trnnick/mapa

It has been a long time I wanted to rework the MAPA package for R, but I could not find the time. Finally I got around starting it. There are three objectives in this:

  1. Clean up code and introduce S3methods. MAPA was the first package that I wrote!
  2. Incorporate the es exponential smoothing implementation from the smooth package. This is an alternative to ets that will remain as the default. It allows longer seasonal cycles and some additional customisation in the specification of the exponential smoothing models.
  3. Implement MAPAx that permits introducting exogenous variables to MAPA.

Tidying up the code is an ever-ongoing process, so (1) will never finish! I have already sorted (2) out and the next task is to work on (3). You can grab the development version on GitHub. At the time of writing you will need the development version of smooth for this to work, on which Ivan has been ironing out a few peculiar bugs that MAPA brought to surface. You will be amazed how much temporal aggregation can mess up otherwise stable code!

Here is a sneak peak of things you could not do with MAPA before:

mapafit <- mapaest(taylor,ppy=336,type="es")
plot(mapafit)
mgit_fig1

Fig. 1: Look at all these aggregation levels!

The new argument is type="es" that lets MAPA know what exponential smoothing implementation to use. Once the estimation is done we can use this to forecast as usual.

mapacalc(taylor,mapafit,comb="wght",outplot=2)
mgit_fig2

Fig. 2: MAPA for high frequency data!

So finally we can now easily use MAPA for series with seasonality more than 24 periods! You may have observed that I am using a different combination of the temporal aggregation states: wght. This is still experimental, so use it at your own risk! I would still recommend using either median or mean.

Another nice feature that the es implementation offers is more flexibility in specifying models. Before we could use the model="ZZZ" to let ets select the best model for each aggregation level. Now we can use the letters X and Y to restrict the search to only additive or  multiplicative options (including N) respectively. So for example we could set model="ZXA" to have MAPA look for exponential smoothing models with any type of error, trend that can be none, additive or damped additive and additive seasonality.

Which of ets or es is most accurate to use with MAPA? This is a difficult question to answer. In different examples I tried, I got different results, but there are a few things I can point out. ets from the forecast package by default does not allow multiplicative trends. This was done as using only additive trends seems to work better for ets. This is not the case for es that considers all possible trends. In the current MAPA implementation you can use the argument allow.multiplicative.trend=TRUE to make ets consider all trends. Similarly, with es you can use model="ZXZ" to restrict it to additive trends. On the M3 dataset, once the specification is identical (only additive trends) es based MAPA performs about 5% worse than ets based MAPA, which is fairly small difference, with no evidence of statistically significant differences. So my recommendation is as follows: MAPA with ets works fine and is the default core, but if the additional flexibility of es is needed, feel confident in using it!

16 thoughts on “MAPA package for R on GitHub

  1. BoDing

    I got exception:
    Number of xreg input variables does not match mapafit specification

    My code is:
    xreg <- merge(holiday, week)
    xreg$date <- NULL
    xreg <- as.matrix(xreg)

    mapa(train_ts, maximumAL = 28, display = 1, xreg=xreg, type="es", ppy=30)

    Could you please explain it ?

    Reply
    1. Nikos Post author

      It seems that the issue is with the xreg. It should be a matrix or array, where each column is an explanatory variable. I just checked on my computer and it worked fine. Check if the dimensions/class of xreg is as it should be – happy to look at data if you want to send me a snippet. If your data is high frequency, you may want to use “w.mean” or “w.median” for the combination argument.
      Cheers, Nikos

      Reply
  2. BoDing

    here is a snippet of xreg:
    > head(xreg, 20)
    holiday tue wed thu fri sat sun
    [1,] 1 0 0 1 0 0 0
    [2,] 2 0 0 0 1 0 0
    [3,] 3 0 0 0 0 1 0
    [4,] 0 0 0 0 0 0 1
    [5,] 0 0 0 0 0 0 0
    [6,] 0 1 0 0 0 0 0
    [7,] 0 0 1 0 0 0 0
    [8,] 0 0 0 1 0 0 0
    [9,] 0 0 0 0 1 0 0
    [10,] 0 0 0 0 0 1 0
    [11,] 1 0 0 0 0 0 1
    [12,] 0 0 0 0 0 0 0
    [13,] 0 1 0 0 0 0 0
    [14,] 0 0 1 0 0 0 0
    [15,] 0 0 0 1 0 0 0
    [16,] 0 0 0 0 1 0 0
    [17,] 0 0 0 0 0 1 0
    [18,] 1 0 0 0 0 0 1
    [19,] 0 0 0 0 0 0 0
    [20,] 0 1 0 0 0 0 0

    Reply
  3. mapa_user

    Greetings,

    Would you be kind to answer the following questions:

    (1) Parameter ppy for function mapaest is described as:
    Periods in a season of the time series at the sampled frequency.
    If insample is a ts object then this is taken from its frequency, unless overriden.

    Could you please rephrase the definition — I’m not sure what’s exactly meant by “periods in a season”.

    (2) The data exhibits seasonality at about quarter boundaries — can this be used (and how) in any of the MAPA functions?

    (2) Also, I have about three years worth of weekly observed data, so should I define the time series as ts( x, frequency = 52)?

    Many thanks,

    mapa_user

    Reply
    1. Nikos Post author

      (1) ppy is the number of periods in a seasonal cycle. Suppose that you have monthly data, where you expect 12 months in a year, then ppy = 12. Another example would be daily time series, where you could use ppy = 364 (to capture the annual cycle) or ppy = 7 (to capture the weekly seasonality). Perhaps a better definition would be the number of periods that make up a full seasonal cycle. This can be different from the frequency coming from the ts class, as was the case for the daily example.
      (2) Not sure what you mean by quarter boundaries, but let me try to address your question in the following way. Suppose you have monthly data. A seasonality of 12 would capture the annual pattern and implicitly any within quarter seasonalities. If you would set ppy = 3, then it would look only for seasonality every three months (quarter) and therefore only the quarterly seasonality would be modelled, ignoring the yearly one. So if I understand your question correctly, if there are two seasonal pattens, and one is a multiple of the other (annual cycle = 4 * quarterly cycle) then use the longest one to set ppy (for the example above thatwould be ppy = 12). If the two seasonal patterns are not aligned, i.e. the longest is not a multiple of the shortest cycle, then you have to choose which of the two will be modelled and set ppy to the appropriate one. The current implementation of MAPA does not support multiple seasonal cycles. This is something that I intend to fix when I get the time. The maths are the same, it’s only a coding question.
      (3) Yes. And that would become the default value for ppy. With one year of data the annual seasonality (ppy = 52) would not be useful and the underlying ETS models would be non-seasonal. Typically I would say you need at least three full seasons to build seasonal models with confidence. If the data is not seasonal then that is not a problem, just set the model to argument to “ZZN” to shut down seasonal models. If there is seasonality perhaps you can extract some information by setting ppy = 13, to capture any quarterly patterns.

      Hope these help! Let me know if you have more questions. I posted yesterday about the inner workings of MAPA, you may find it helpful, especially for understanding the different combination options.

      Best, Nikos

      Reply
  4. mapa_user

    Thank you for your super-prompt response!

    I understand your answers, but given that some of my formulations weren’t precise, please let me clarify in more concrete terms.

    My data set consists of 159 values observed weekly (thus about three years worth of data) along with the indication (1,2,3, or 4) of the quarter of the corresponding week when the data point was observed. I want to use MAPA to forecast the time series for one or two quarters (13 or 26 weeks) ahead.

    My R-code looks as follows:
    tx = ts( x$qty, freq=52)
    mapa( tx, ppy=13, fh=13, maximumAL=52, xreg=c( x$quarter, 3,3,3,3, rep(4,9)))

    xreg indicates the quarters for the forecasted period (frst four weeks are in the third quarter and the next nine are in the fourth).
    The following error is reported:
    number of columns of matrices must match (see arg 32)
    In addition: Warning message:
    In mapaest(y = y, ppy = ppy, minimumAL = minimumAL, maximumAL = maximumAL, :
    Only type=”es” can accept xreg inputs. Forcing appropriate type.

    This is the data:
    x = c( 14.568, 14.545, 20.285, 18.378, 17.028, 13.373, 15.870, 18.966, 16.848, 16.781, 13.916, 15.479, 16.210, 15.878, 13.954, 16.963, 16.114, 17.828, 13.748, 13.818, 17.234, 15.908, 15.439, 14.823, 15.808, 15.192, 13.775, 11.358, 12.559, 16.440, 19.021, 16.012, 14.514, 16.979, 16.571, 13.498, 13.797, 17.344, 15.736, 15.035, 13.477, 15.357, 18.161, 16.400, 13.169, 15.779, 15.645, 16.754, 14.874, 14.345, 16.756, 14.007, 15.122, 13.902, 26.493, 16.625, 14.766, 13.518, 13.359, 16.993, 17.133, 13.706, 13.534, 16.013, 16.031, 13.117, 11.891, 13.919, 14.975, 14.135, 13.741, 14.901, 13.108, 13.833, 12.474, 13.731, 11.949, 18.091, 13.969, 13.993, 10.399, 12.391, 18.476, 16.439, 21.136, 15.903, 14.850, 13.652, 16.751, 13.202, 19.121, 16.253, 16.434, 13.069, 15.267, 19.712, 16.341, 14.974, 16.443, 16.194, 15.465, 15.874, 11.423, 16.837, 17.107, 15.107, 14.742, 14.710, 15.013, 18.595, 13.733, 14.881, 15.759, 12.847, 14.206, 13.565, 13.812, 14.635, 13.438, 13.457, 16.795, 16.201, 17.549, 13.336, 12.877, 17.277, 14.679, 13.834, 16.482, 20.663, 14.254, 13.725, 9.441, 15.989, 19.711, 13.523, 11.583, 12.777, 14.719, 16.858, 12.789, 13.250, 18.772, 17.920, 13.796, 14.521, 13.178, 22.832, 14.002, 14.096, 17.233, 15.376, 16.177, 15.784, 13.764, 17.464, 20.556, 15.135, 12.390)

    quarter = c(3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)

    data = data.frame( x, quarter )

    Could you please help me resolve the error and even before that give a suggestion what should be the values of the following parameters:
    frequency (in ts)
    ppy, minimumAL, maximumAL (in mapa)

    Thanks again!

    mapa_user

    Reply
  5. mapa_user

    Oops, sorry for the typos, the correct code snippet is

    x = c( 14.568, 14.545, 20.285, 18.378, 17.028, 13.373, 15.870, 18.966, 16.848, 16.781, 13.916, 15.479, 16.210, 15.878, 13.954, 16.963, 16.114, 17.828, 13.748, 13.818, 17.234, 15.908, 15.439, 14.823, 15.808, 15.192, 13.775, 11.358, 12.559, 16.440, 19.021, 16.012, 14.514, 16.979, 16.571, 13.498, 13.797, 17.344, 15.736, 15.035, 13.477, 15.357, 18.161, 16.400, 13.169, 15.779, 15.645, 16.754, 14.874, 14.345, 16.756, 14.007, 15.122, 13.902, 26.493, 16.625, 14.766, 13.518, 13.359, 16.993, 17.133, 13.706, 13.534, 16.013, 16.031, 13.117, 11.891, 13.919, 14.975, 14.135, 13.741, 14.901, 13.108, 13.833, 12.474, 13.731, 11.949, 18.091, 13.969, 13.993, 10.399, 12.391, 18.476, 16.439, 21.136, 15.903, 14.850, 13.652, 16.751, 13.202, 19.121, 16.253, 16.434, 13.069, 15.267, 19.712, 16.341, 14.974, 16.443, 16.194, 15.465, 15.874, 11.423, 16.837, 17.107, 15.107, 14.742, 14.710, 15.013, 18.595, 13.733, 14.881, 15.759, 12.847, 14.206, 13.565, 13.812, 14.635, 13.438, 13.457, 16.795, 16.201, 17.549, 13.336, 12.877, 17.277, 14.679, 13.834, 16.482, 20.663, 14.254, 13.725, 9.441, 15.989, 19.711, 13.523, 11.583, 12.777, 14.719, 16.858, 12.789, 13.250, 18.772, 17.920, 13.796, 14.521, 13.178, 22.832, 14.002, 14.096, 17.233, 15.376, 16.177, 15.784, 13.764, 17.464, 20.556, 15.135, 12.390)

    quarter = c(3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
    data = data.frame( x, quarter )
    tx = ts( data$x, freq=52)
    mapa( tx, ppy=13, fh=13, maximumAL=52, xreg=c( data$quarter, 3,3,3,3, rep(4,9)))

    Reply
    1. Nikos Post author

      It seems that one of the last versions of the smooth package that is used when the frequency of the time series is >24 introduced some changes that were causing problems. This is now fixed in the GitHub repository of MAPA.
      As for the modelling question, see the following script with comments! I did not use the mapa function, but instead the mapaest and mapacalc that are wrapped by mapa, to make more evident what the different options are.

      # Load data
      x = … # I removed x to save some space in the comment
      data = data.frame( x, quarter )
      tx = ts( data$x, freq=52)

      # —- Fit MAPAs —-
      # With no xregs
      mapafit52 <- mapaest(tx,type="es") mapafit13a <- mapaest(tx,ppy=13,type="es") mapafit13b <- mapaest(tx,ppy=13,maximumAL=52,type="es") # with xreg X <- cbind(c(data$quarter, 3,3,3,3, rep(4,13), rep(1,13), rep(2,13), rep(3,13), rep(4,13))) mapafit52X <- mapaest(tx,type="es",xreg=X) # ---- Investigate what we constructed ---- # First, look at the 13-period MAPAs plot(mapafit13a) # No seasonality plot(mapafit13b) # NO seasonality, just more levels # Neither model picked up any seasonality. Also there seems to be limited to gain by aggregating up to level 52. mapacalc(tx,mapafit13a,fh=13,outplot=2) mapacalc(tx,mapafit13b,fh=13,outplot=2) # In terms of forecasting the level is somewhat different, but otherwise everything looks more or less the same. # Second, look at 52 period MAPAs plot(mapafit52) # No seasonality plot(mapafit52X) # No ES seasonality # I will use 52 step ahead forecasts to show any seasonality from xregs mapacalc(tx,mapafit52,fh=52,outplot=2) mapacalc(tx,mapafit52X,fh=52,outplot=2,xreg=X) # Some minor seasonality from xreg # To me it looks as if there is no seasonlity still, let us check # the way MAPA does. I will aggregate to level 13 (quarterly series) # and produce a seasonal plot tx13 <- tsaggr(tx,13,fmean=TRUE)$out[[1]] tx13 <- ts(tx13,frequency=4) seasonplot(tx13) # Looks to me more like a Q4 effect, but it was not # strong enough for ES to pick it up Xb <- c(rep(0,7), rep(c(rep(1,13),rep(0,13*3)),6)) mapafit52Xb <- mapaest(tx,type="es",xreg=Xb) mapacalc(tx,mapafit52Xb,fh=52,outplot=2,xreg=Xb) # Some effect for Q4 is modelled in the xreg # Not sure which mapafit52, mapafit52X or mapafit52b is better # some out-of-sample comparison should be helpful here

      Reply
      1. mapa_user

        Many thanks again for your response.

        I re-run your code after restarting R and downloading MAPA from the Github and installing. It all worked except the following line:

        mapafit13b <- mapaest(tx,ppy=13,maximumAL=52,type="es")
        Error in (function (…, deparse.level = 1) :
        number of columns of matrices must match (see arg 40)

        The same line works with maximumAL=26. I'm running R version 3.3.1 (2016-06-21)
        Perhaps there is a version/installation fix for this issue as well.

        Kind regards,

        mapa_user

        Reply
        1. Nikos Post author

          I did another check and made a change that adds more checks on the outputs from the smooth package. Can you please download from github both MAPA (devtools::install_github(“trnnick/mapa”)) and smooth (devtools::install_github(“config-i1/smooth”)) packages and try again?

          Thanks!

          Reply
  6. mapa_user

    Hello Nikolaos,

    Just tried devtools::install_github(“config-i1/smooth”) but there is a compilation/link problem with ‘smooth’ (whereas mapa compiled/loaded successfully). The reason seems to be that the ld command references a local directory not present at my system.

    Can this option be modified so as (perhaps) not to reference this specific path/version but to let the compiler system choose some default?

    ld: warning: directory not found for option ‘-L/usr/local/lib/gcc/x86_64-apple-darwin13.0.0/4.8.2’
    ld: library not found for -lgfortran
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    make: *** [smooth.so] Error 1
    ERROR: compilation failed for package ‘smooth’

    Reply
    1. Nikos Post author

      Great to hear that and thanks for posting the solution here as well.

      Reply
  7. Gokhan

    Hello Nikos,
    Thank you very much for information you provided. I have some questions about general understanding of MAPA and ppy value. Firstly, I would like to use MAPA method for solar energy production forecast (Solar data might be very fluctuating). My questions are as follows;

    1) Do I need only annual dataset to use MAPA? or can I use it for any dataset?
    2) I have hourly observations for 7 days, and I would like to forecast 8th day with 24 hour. Is ppy=24 to catch daily seasonality?
    3) I have daily observations for 30 days, and I would like to forecast 91st day to 97th day(next week) with 7 days. Is ppy=30 to catch monthly seasonality and ppy=7 for weekly seasonality?

    Thank you very much for your help.
    Best Regards.

    Reply
    1. Nikos Post author

      1) You do not need an annual dataset. You only need the highest avaiable frequency time series. All the aggregate series are constructed by MAPA internally. The thief package offers an alternative to MAPA that uses MTA as well.
      2) Yes, if you have only very limited data. If you have a long history, as MAPA does not handle double seasonal effects, to capture that (I presume there is a strong day of the week cycle) you may want to use 24 * 7 = 168 for ppy. 168 describes well both the hour in the day and day of the week cycles when you are stuck with single seasonal models. Althernatibely would could use the thief package with TBATS.
      3) I assume your data are daily. In that case you would use ppy = 30 if you are looking for a strong monthly seasonality, but in my expereince such cycles very rarely if ever exist. I would suggest sticking to 7 (day of the week) or 7*4 = 28 (4 weeks rather than 30 day months).

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *