Real Statistics Time Series Analysis Functions

The following is a summary of all the time series analysis functions provided in the Real Statistics Resource Pack. For many of the following functions, R1 contains a column range containing a time series.

These functions are organized into the following categories:

  • Forecast Accuracy
  • Autocorrelation and Partial Autocorrelation
  • Augmented Dickey-Fuller test
  • Other Unit Root tests
  • Mann-Kendall test and Sen’s Slope
  • Cox-Stuart test
  • Granger Causality test
  • Engle-Granger test
  • ARIMA
  • SARIMA
  • Interpolation for Missing Time Series Data
  • Hampel Outlier Detection and Filtering
  • Panel Data

Forecast Accuracy Functions

ForecastError(R1, R2, type) = the error statistic of the specified type; type takes one of the following string values: me, mse, rmse, mae, mpe, mape, smape, u1, u2

Forecast_Error(R1, R2, lab) = a column array containing the error statistics: me, mse, rmse, mae, mpe, mape, smape, u1, u2; if lab = TRUE then an extra column of labels is appended to the output (default FALSE).

DIEBOLD(R1, R2, R3, h, type) = the test statistic for the Diebold-Mariano test where h is the number of lags (default 0, which is equivalent to h = the cube-root of the number of elements in R1) and type = 0 (default) for e2 error measurement and type = 1 for the |e| error measurement.

DMTEST(R1, R2, R3, h, type) = p-value for the Diebold-Mariano test where h and type are as for the DIEBOLD function.

HLN(R1, R2, R3, h, type) = the test statistic for the HLN test where h is the number of lags (default 0, which is equivalent to h = the cube-root of the number of elements in R1) and type = 0 (default) for e2 error measurement and type = 1 for the |e| error measurement.

HLNTEST(R1, R2, R3, h, type) = p-value for the HLN test where h and type are as for the DIEBOLD function.

LossDiff(R1, R2, R3, type) = a column array containing the loss-differential time series for the Diebold-Mariano test with type as for the DIEBOLD function.

PESARAN(R1, R2) = the test statistic for the Pesaran-Timmermann test

PTTEST(R1, R2) = p-value for the Pesaran-Timmermann test

Autocorrelation and Partial Autocorrelation Functions

ACF(R1, k) = the ACF value at lag k for the time series in range R1

ACVF(R1, k) = the autcovariance at lag k for the time series in range R1

BARTEST(r, n, lag) = p-value of Bartlett’s test for correlation coefficient r based on a time series of size n

BARTEST(R1, type, lag) = BARTEST(r, n, lag) where n = the number of elements in range R1 and r = ACF(R1,lag)

PIERCE(R1, typelag) = Box-Pierce statistic Q for range R1

BPTEST(R1, typelag) = p-value for the Box-Pierce test for range R1

LJUNG(R1, typelag) = Ljung-Box statistic Q for range R1

LBTEST(R1,typelag) = p-value for the Ljung-Box test for range R1

ACOV(R1, k) – the autcovariance matrix at lag k

ACORR(R1, k) – the autcorrelation matrix at lag k

PACF(R1, k) – the PACF value at lag k

In the above functions where the type argument is missing, the test is performed using the autocorrelation coefficient (ACF). If the value assigned instead is 1 or “pacf” then the test is performed using the partial autocorrelation coefficient (PACF). Actually, if the second argument takes any value except 1 or “pacf”, then the ACF value is used. lag = the lag value used for the test.

Augmented Dickey-Fuller Test

ADFTest(R1, lab, lag, criteria, type, alpha): returns an 8 × 1 range which contains the following values:  tau-statistic, tau-critical, yes/no (stationary or not), AIC value, BIC value, # of lags (p), the first-order autoregression coefficient and estimated p-value.

where

  • lab : if lab = TRUE (default is FALSE), the output consists of a 6 × 2 range whose first column contains labels.
  • lag = the maximum number of lags to use in the test (default 0); if lag < 0 then lag is automatically set to =ROUND(12*(n/100)^.25,0), as proposed by Schwert, where n = the # of elements in the time series.
  • criteria
    • = “none” : no criteria are used, and so p is set to the value of lag
    • = “aic” : the AIC is used to determine the number of lags p (where p lag)
    • = “bic” : the BIC is used to determine the number of lags p (where plag)
  • type = the test type : 0 = no constant, no trend, 1 = constant, no trend (default), 2 = constant, trend
  • alpha = significance level (default .05)

ADFCRIT(n, alpha, type) = critical value, tau-crit, for the stated type of ADF test at the stated alpha value, when the time series has n elements

ADFPROB(x, n, type) = estimated p-value (based on linear interpolation) for the ADF test at x for a time series of length n

Other Unit Root Test Functions

PPTEST(R1, lab, lags, type, alpha)  – an array function that returns a column range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags, autocorrelation coefficient, and p-value.

KPSSTEST(R1, lab, lags, type, alpha)  – an array function that returns a column range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags, and p-value.

Mann-Kendall Test and Sen’s Slope

MK_TEST(R1, lab, tails, alpha): returns a column array with the values S, s.e., z-stat, p-value, and trend for the Mann-Kendall test

SK_TEST(R1, per, lab, tails, alpha): returns a column array with the values S, s.e., z-stat, p-value, and trend for the Seasonal Kendall test where per = the period of seasonality

SEN_SLOPE(R1, lab, alpha): returns a column array with the values: Sen’s slope along with the lower and upper limits of the 1–alpha confidence interval.

R1 is a column array containing the time series values, if lab = TRUE then an extra column of labels is appended to the output (default FALSE), tails = 1 or 2 (default), and alpha is the significance level (default .05). trend takes the values “yes” or “no” in the two-tailed test, and “upward” or “no” in the one-tailed case where S > 0 and “downward” or “no” in the one-tailed case where S < 0.

Cox-Stuart Test

COX_STUART(R1, tails): returns a column array with the p-value of the Cox-Stuart test for the data in the column array R1 along with the ratio between positive differences over total differences

Granger Causality Test

GRANGER(Rx, Ry, lags) = the F statistic of the test

GRANGER_TEST(Rx, Ry, lags) = p-value of the test

Engle-Granger Test

EGCRIT(n, alpha, tr) = critical value of the Engle-Granger test for two time series of length n for the significance level alpha (between .01 and .10, default .05).

EGPROB(t, n, tr) = estimated p-value of the Engle-Granger test for two time series of length n when the test statistic is t

EGTEST(Rx, Ry, lab, lag, criteria, trend, alpha): outputs a column array with the values tau-stat, tau-critical, cointegrated (yes/no), lags, p-value.

Rx and Ry are column arrays containing the two time series, while lab, lag, criteria, and alpha are as for the ADFTEST function, and tr = TRUE if there is a trend (default FALSE).

ARIMA Functions

ARIMA_Coeff(R1, p, q, d, con, lab) = returns a p+q+1 × 4 array, each row of which contains the coefficient, standard error, t-stat, and p-value (in order: constant, phi 1, phi 2, …, theta 1, theta 2, …) of the ARIMA(p,q,d) model for the time series data in column range R1; if lab = TRUE (default FALSE), then an extra row and column are appended with labels; if con = TRUE (default) then a constant term is used, otherwise it is not (i.e. it is set to zero).

ARIMA_Stats(R1, R2, p, q, d, con, lab) = 7 × 1 column array containing the values LL, SSE, MSE, AIC, BIC, AIC augmented and BIC augmented for the ARIMA(p,q,d) model for the time series data in column range R1 based on the coefficients in the p+q+1 × 1 column range R2; if lab = TRUE (default FALSE), then an extra column of labels is appended to the output; if con = TRUE (default) then a constant term is used, otherwise it is not.

ARMap(R1,p) – takes the time series in the n × 1 range R1 and outputs the n–p × p+1 range where the first p columns represent the X values in the linear regression and the last column represents the Y values.

ADIFF(R1, d) – takes the time series in the n × 1 range R1 and outputs an n–d × 1 range containing the data in R1 differenced d times

ARRoots(R1, prec, iter, r, s) – takes the p × 1 range R1 containing the phi coefficients of an ARIMA model and outputs a p × 3 range, each of whose rows consists of the real and imaginary parts of the roots of the characteristic polynomial along with that root’s absolute value.

MARoots(R1, prec, iter, r, s) – takes the q × 1 range R1 containing the theta coefficients of an ARIMA model and outputs a q × 3 range, each of whose rows consists of the real and imaginary parts of the roots of the characteristic polynomial along with that root’s absolute value.

See the ROOTS function below for information about the prec, iter, r, and s arguments.

PSICoeff(R1, R2, k, rev): based on the column ranges R1 consisting of phi coefficients and R2 consisting of theta coefficients, outputs a k × 1 range containing the first k psi coefficients (starting with ψ0 = 1) for the ARMA model with the coefficients in R1 and R2.

If k is omitted (default) then k is set equal to the number of rows in the highlighted range. If rev = TRUE (default), then the phi and theta coefficients are listed in reverse order.

ARMA_SSE(R0, R1, R2, avg, p1, q1) = SSE value for the time series data in R0 based on the p phi coefficients in R1 and q theta coefficients in R2; also the p1th phi coefficient is zeroed out and/or the q1th theta coefficient is zeroed out; finally the mean is zeroed out if avg is set to a non-zero value.

ARIMA_SE(R1, p, q, d, R2): returns a column array with the standard errors that correspond to the ARIMA(p, q, d) model coefficients in the column array R2 based on the time series data in R1

SARIMA Functions

ADIFF(R1, d, D, per): returns a column array that corresponds to the time series data in R1 after ordinary differencing d times and seasonal differencing D times based on a seasonal period of per.

SARMA_RES(R1, Rar, Rma, Rsa, Rsm, per, cons): returns a column array with the residuals that correspond to the time series data in the column array R1 based on a SARMA model with AR coefficients in Rar, MA coefficients in Rma, seasonal AR coefficients in Rsa, seasonal MA coefficients in Rsm, the constant coefficient in cons and the seasonal period per.

SARMA_PRED(R1, Rar, Rma, Rsa, Rsm, per, cons, f): returns a column array with the predicted values that correspond to the time series data in the column array R1 plus the next f forecast values based on a SARMA model with AR coefficients in Rar, MA coefficients in Rma, seasonal AR coefficients in Rsa, seasonal MA coefficients in Rsm, the constant coefficient in cons and the seasonal period per. If f is omitted then the highlighted (output) range is filled with forecasted values (i.e. f is set equal to the number of rows in the highlighted range minus the number of rows in R1).

SARIMA_PRED(R0, R1, d, D, per): returns a column array with the forecasted values for the SARIMA(p, d, q) ⨯(P, D, Q)per model of the time series data in R1 that correspond to the forecast values in R0 for the SARMA(p, q) ⨯(P, Q)per model.

Interpolation for Missing Time Series Data

TSImputed(R1, itype, k, stype, per): returns a column array of the same size as R1 where each missing element in R1 is imputed based on the imputation type itype (linear (default), spline, prior, next, sma, wma, ema) and span k (default 2), stype (default none) and per is the seasonal period (default 4 for quarterly), which is only used when stype is not “none”.

Hampel Outlier Detection and Filtering

HAMPEL(R1, nlen, xsigma): returns a column array of the same size as the column array or cell range R1. Each element is either equal to the associated element in R1 or is the appropriate median if the associate element is an outlier.

nlen = # of adjacent data elements on either side of the data element (default 3). xsigma = # of standard deviations (i.e. # of MAD’s times 1.4826).

Panel Data

PANEL_MEANS(R1, periodshead): returns an array of the unit means (across the time periods) of the panel data in R1.

PANEL_DEMEAN(R1, periods, head, param): returns an array of the same size and shape as R1 consisting of the demeaned version of the panel data in R1. Each element in the output consists of the corresponding data element in R1 minus the mean for that element’s unit. param takes values between 0 and 1 (default 1).

PANEL_DIFF(R1, periods, head): returns an array consisting of the first-differenced version of the panel data in R1 based on the time periods for each unit.

PANEL_DVM(R1, periods, head, ttype): returns an array consisting of the data in R1 augmented by additional columns corresponding to dummy variables for each of the units. If ttype = 0 (default) then the dummy variables correspond to units 1 through k (without an intercept) where R1 contains k units. When ttype = 1 then the dummy variables correspond to units 2 through k and when ttype = -1 the dummy variables correspond to units 1 through k-1.

PANEL_REM(R1, periods, head): returns an array consisting of the random effects model (REM) for the panel data in R1.

REM_PARAM(Rx, Ry, periods) = the transformation parameter for the random-effects model based on the X data in Rx and Y data in Ry.

periods = the number of time periods in the panel data; it is assumed that the rows of panel data in R1 are sorted first by unit number and then by time period, although the time and units values are not included in R1. R1 is assumed not to have any missing time periods (balanced panels). 

When head = TRUE, the first row of both R1 and the output contains column headings (default FALSE).

References

Greene, W. H. (2002) Econometric analysis. 5th Ed. Prentice-Hall
https://www.scirp.org/(S(351jmbntvnsjt1aadkposzje))/reference/referencespapers.aspx?referenceid=1243286

Gujarati, D. & Porter, D. (2009) Basic econometrics. 5th Ed. McGraw Hill
http://www.uop.edu.pk/ocontents/gujarati_book.pdf

12 thoughts on “Real Statistics Time Series Analysis Functions”

  1. First, a big thanks to Charles for this immense project. It’s a true public service to all self-learners.
    I was about to do a time series analysis in R and by chance I found your resource.
    I would like to perform an ARIMA(p,q,d) of predicted value Y on variables X_1, X_2… (with Ys and Xs being continuous). In the examples give, the range R1 is a single column and it looks like we’re only analysing Y on its lagged values and residuals. How can I regress Y on Xs using your ARIMA function ?

    Reply
  2. I have added Realstats-2003.xla to the Add-Ins, but I can pull out the add-on functions. For instance, it did not work when I was trying to follow your Example 1 of the Autocorrelation exercise and typed in cell E4 “=covariance.s(b4:b21,b6:b23)”. I can get a value of 177 when I used the usual “=variance(b4:b21,b6:b23)”. I have installed RealStats-2003.xla to my 2003 Excel and when I type “+VER()” on the spreadsheet, I got “3.0”. Thank you for your help so that I can start using your wonderful add-ons. Jay

    Reply
    • Hello Jay,
      Since Excel 2003 is no longer supported by Microsoft, I stopped adding new capabilities quite a while ago. To get all the Real Statistics capabilities you need to use Excel 2010 or later.
      Charles

      Reply
  3. Hi! I am trying to use ARIMA_Coeff function but is just returns blank. It works while generating a full ARIMA model from the tool, but as a standalone function it is not returning anything. For example:

    “=ARIMA_Coeff(B7:B19;1;0;0;TRUE;TRUE)” is just blank. Both entered as an array formula or regular formula.

    Reply
  4. I am trying to use ADF Test but its not returning results.
    =ADFTEST(M5:M1442,TRUE,5,”aic”,1,0.05)

    where M5: M1442 contains the time series.
    Can you suggest if my usage is correct?

    Reply
    • Hello Sahil,
      If you send me an Excel file with your data and the =ADFTEST(M5:M1442,TRUE,5,”aic”,1,0.05) formula, I will try to figure out what is going wrong.
      Charles

      Reply

Leave a Comment