Real Statistics Regression/ANOVA Functions

The following is a summary of the Regression and ANOVA worksheet functions provided in the Real Statistics Resource Pack.

These functions are organized into the following categories:

  • Linear Regression
  • Heteroskedasticity
  • Autocorrelation
  • Regression Power and Sample Size
  • Stepwise Regression
  • Exponential Regression
  • Polynomial Regression
  • Least Absolution Deviation (LAD) Regression
  • Deming Regression
  • Total Least Squares Regression
  • Passing-Bablok Regression
  • Logistic Regression
  • Probit Regression
  • Multinomial Logistic Regression
  • Ordinal Logistic Regression
  • Poisson Regression
  • Ridge and LASSO Regression
  • Tobit Regression
  • Non-parametric Regression
  • Survival Analysis
  • ANOVA
  • Trend Analysis and Orthogonal Polynomial Contrasts
  • ART ANOVA
  • ANOVA-related
  • Randomized Complete Block Design (RCBD)
  • Sphericity
  • Non-parametric tests
  • 2k Factorial Design
  • Taguchi DOE
  • ANOVA Sample Size
  • Interclass Correlation
  • Distribution Functions
  • Matching Functions
  • Categorical Coding
  • Iterative Proportional Fitting Procedure

Linear Regression

The following are ordinary, non-array functions where R1 contains the X data and R2 contains the Y data:

Picture65

A second R2 parameter can be used with each of the df functions above, although this parameter is not used. Similarly, you can use SSRegTot(R1, R2) and its value will be equivalent to SSRegTot(R2). All these functions can optionally take a third argument con, where con = TRUE (default) means that the regression model takes a constant term, and con = FALSE means that the regression model doesn’t have a constant term.

Click here for more details about these functions.

There is also a second form of the RSquare function in which RSquare(R1, k) = R2 where the X data consists of all the columns in R1 except the kth column, and the Y data consist of the kth column of R1.

The following are array functions where R1 contains the X data and R2 contains the Y data.

DESIGN(R1) design matrix for the data in R1
HAT(R1, con) hat matrix for the data in R1
DIAGHAT(R1, con) diagonal of the hat matrix for the data in R1
CORE(R1) core of the hat matrix for the data in R1
LEVERAGE(R1, con) leverage vector = diagonal of hat matrix for the data in R1
RegCov(R1, R2) covariance matrix for the regression coefficients of the regression line
RegCoeff(R1, R2, con) two-column array with the regression coefficients for the regression line in the first column and the corresponding standard errors in the second column
RegPred(R0, R1, R2, lab, alpha) 7 × 1 column array containing the predicted y value for the data in R0, the standard error for the confidence interval, the lower and upper ends of the 1–alpha confidence interval, the standard error for the prediction interval, the lower and upper ends of the 1–alpha prediction interval (alpha defaults to .05)
RegPredCC(R0, Rc) predicted y values for x values in range R0 based on the regression coefficients in column range Rc; handles case with and w/o an intercept
StdRegCoeff(R1, R2, Rc, ystd) column array with the standardized regression coefficients for R1 and R2 based on the regression coefficients in the column range Rc (including an intercept coefficient). If ystd = TRUE (default), then standardized regression coefficients are based on the y data being standardized. If Rc is omitted, its values are calculated from R1 and R2 using ordinary least squares. 
UnStdRegCoeff(R1, R2, Rc, ystd) column array with the unstandardized regression coefficients for R1 and R2 based on the standardized regression coefficients in the column range Rc (including an intercept coefficient). If ystd = TRUE (default), then the standardized regression coefficients are based on the y data being standardized
RRegCoeff(R1, R2, hc, con) two-column array with the regression coefficients for the regression line in the first column and the corresponding robust standard errors in the second column, where hc = 0 through 4 corresponding to HC0 through HC4
WRegCoeff(R1, R2, R3) two-column array with the regression coefficients for the weighted regression line in the first column and the corresponding standard errors in the second column, where R3 contains the weights
RegCoeffSE(R1, R2) vector with the standard errors of the coefficients for the regression line
RegY(R1, R2, con) vector of predicted values for Y based on the regression line = TREND(R2,R1)
RegE(R1, R2, con) vector of residuals based on the regression line
RegStudE(R1, R2) vector of studentized residuals based on the regression line
RSquare_Test(R1, R2, Ry, lab) column array with values R1-square, R2-square, F, df1, df2 and the p-value of the test of the significance of X data in R2 (reduced model) vs. X data in R1 (full model) where Ry contains the Y data.
SHAPLEY(R1, R2) vector with the Shapley-Owen decomposition of R2
SlopesTest(R1, R2, R3, R4, b, lab) vector containing s.e. of differences between slopes, t, df and p-value where R3 and R4 are the X and Y values for a second regression line; if b = TRUE (default) the pooled s.e. is used.

If lab = TRUE then a column of labels is added to the output (default = FALSE)

The following are non-array functions where R1 contains the X data and R2 contains the Y data.

RegPredC(R0, Rc) predicted y values for x values in R0 based on the regression coefficients in Rc. R0 and Rc can be column or rows arrays or ranges.
RSquareTest(R1, R2, Ry) p-value of the test of the significance of X data in R2 (reduced model) vs. X data in R1 (full model); Ry contains the Y data.
RegAIC(R1, R2, con) Akaike’s Information Criterion (AIC) for the regression model
RegAICc(R1, R2, con) corrected AICc for the regression model
RegSBC(R1, R2, con) Schwarz Baysean Criterion (SBC) for the regression model
TOLERANCE(R1, j) Tolerance of the jth variable for the data in range R1
VIF(R1, j) VIF of the jth variable for the data in range R1
WVIF(R1, R2, j) VIF of the jth variable for the data in R1 using the weights in R2

Note that if the last parameter in VIF or WVIF is omitted then a column array with the VIF values for all the independent variables is returned.

Heteroskedasticity

In the following R1 is an array of X values, R2 is a column array of Y values. If chi = TRUE, (default) the chi-square version of the test is used; otherwise, the F-test is used.

BPagStat(R1, R2, chi) Breusch-Pagan test statistic
BPagTest(R1, R2, chi) p-value for Breusch-Pagan test
WhiteStat(R1, R2, chi) White test statistic
WhiteTest(R1, R2, chi) p-value for the White test statistic

Autocorrelation

CO_RHO(R1, R2, iter, prec) the value of rho calculated using the Cochrane-Orcutt method based on iter (default 1000) iterations unless the change in the rho value is less than prec (default .0001), at which point the process stops.
BGSTAT(R1, R2, p, chi) Breusch-Godfrey statistic for the x data in R1 and y data in R2 based on p-order autocorrelation. If chi = TRUE (default) use chi-square test; otherwise use F test
BGTEST(R1, R2, p, chi) p-value for the Breusch-Godfrey test for p-order autocorrelation based on the x data in R1 and y data in R2. If chi = TRUE (default) use chi-square test; otherwise use F test
DURBIN(R1) Durbin-Watson statistic d where R1 is a column vector containing residuals
DURBIN(R1, R2) Durbin-Watson statistic d where R1 contains X data and R2 contains Y data

The following array functions are also supported where, except for the first function, R1 contains X data and R2 contains Y data.

DURBIN(R1, lab, alpha) returns the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 is a column vector containing residuals
DURBIN(R1, R2, lab, alpha) returns the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance.
COCoeff(R1, R2, rho) returns an array where column 1 contains the regression coefficients based on the rho and column 2 contains the corresponding standard errors. If rho is omitted then rho defaults to 1 – d/2 where d = the Durbin-Watson statistic
CO_Coeff(R1, R2, iter, prec) returns COCoeff(R1, R2, CO_RHO(R1, R2, iter, prec))
HACCoeff(R1, R2, lags) returns an array whose first column are the OLS regression coefficients and whose second column are the corresponding Newey-West HAC standard errors, where lags is the # of lags.

If lab = FALSE (default) then the output is a 4 × 1 column array, while if lab = TRUE then the output is a 4 × 2 array with an extra column of labels.

Regression Power and Sample Size

REG_POWER(e, n, k, type, α, m, prec) power of multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default), e = R2 effect size if type = 2 and e = noncentrality parameter if type = 0
REG_SIZE(e, k, 1−β, type, α, m, prec) minimum sample size required to obtain power of at least 1−β for multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default) and e = R2 effect size if type = 2

n = the sample size, tails = # of tails: 1 or 2 (default), α = significance level (default = .05) and m and prec as for the noncentral distribution functions.

Stepwise Regression

The following array functions are used to create a stepwise regression model. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values and R3 is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise. If cons = TRUE (default) then regression with a constant term is used; otherwise regression through the origin is employed

RegRank(R1, R2, R3, cons) returns a 1 × k array containing the p-value of eachcoefficient that can be added to the regression model defined by R1, R2, R3
RegCoeffP(R1, R2, R3, cons) returns a 1 × k array containing the p-value of eachcoefficient in the regression model defined by R1, R2, R3
RegStepwise(R1, R2, alpha, cons) returns a 1 × k array, each of whose non-blank element corresponds to an x variable that should be retained in the stepwise regression model. Actually, the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model. alpha = significance level (default .15)

Exponential Regression

The following are array functions that support a nonlinear exponential regression model:

ExpCoeff(R1, R2, iter, lab) outputs a 2 × 4 array whose first column contains the coefficients α and β for the regression, whose 2nd column contains the corresponding standard errors for these coefficients, whose 3rd column contains SSE and MSE, and whose 4th column contains MSReg and dfT. If lab = TRUE then an extra row is added with labels (default = FALSE).
ExpPred(R, R1, R2, iter) outputs an m × 1 column array with the values predicted by the exponential model for R1 and R2 based on the data in the m × 1 column vector of x values.
ExpPredC(R, α, β) outputs an m × 1 column array with the values predicted by the exponential model with coefficients α and β based on the data in the m × 1 column vector R of x values.

Here iter = the number of iterations (default 20). The last two functions can also be used as non-array functions of the following form:

ExpPred(x, R1, R2, iter) value predicted by the exponential model for x based on the data in R1 and R2
ExpPredC(x, α, β) value predicted by an exponential model with coefficients α and β for x

Polynomial Regression

The following are functions that support polynomial regression. The first two functions are array functions. R1 and R2 are column arrays containing x and y data values respectively and deg is the degree/order of the polynomial.

PolyDesign(R1, deg, ones) returns an array consisting of x, x2, …, xdeg columns. If ones = TRUE, then the output is 1, x, x2, …, xdeg
PolyCoeff(R1, R2, deg) returns a two-column array consisting of the polynomial regression coefficients and their standard errors
PolyRSquare(R1, R2, deg) R-square value for the polynomial regression
PolyDegree(R1, R2, deg) the highest degree polynomial ≤ deg which produces a significantly different R-square value

Least Absolute Deviation (LAD) Regression

The following are array functions that support LAD regression. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values, con takes the value TRUE if the regression includes a constant term and iter is the number of iterations used in the iteratively reweighted least-squares algorithm (default = 25).

LADRegCoeff(R1, R2, coniter) column array containing the LAD regression coefficients, k+1 × 1 array if con = TRUE and × 1 array if con = FALSE
LADRegWeights(R1, R2, coniter) × 1 column array consisting of the weights calculated from the iteratively reweighted least squares algorithm
LADRegCoeffSE(R1, R2, coniter, nboots) column array consisting of the standard errors of the LAD regression coefficients based on bootstrapping nboots times; k+1 × 1 array if con = TRUE and × 1 array if con = FALSE

Deming Regression

The following are functions that support Deming regression, all but the last function are array functions. R1 is an array containing x data values, R2 is an array containing y data values and λ is the lambda value (in which case R1 and R2 contain one column) or omitted (in which case lambda is calculated from R1 and R2). If lab = TRUE (default is FALSE) then the output contains an extra column of labels, except for DRegResiduals where lab = TRUE means that the output contains an extra row of labels.

DRegCoeff(R1, R2, λ, lab) 2 × 2 array containing the Deming  regression intercept and slope coefficients along with the standard errors of these coefficients
DRegResiduals(R1, R2, λ, lab) n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual, and optimized residual for each pair of data elements in R1 and R2 where n = the number of rows in R1 (or R2).
DRegIdentity(R1, R2, λ, lab) 2 × 1 array consisting of x̄–ȳ and se(x̄–ȳ) for Deming regression
DRegPred(xo, R1, R2, λ, alphalab) 4 × 1 array consisting of the predicted value of y for x0, the standard error of the prediction, and the confidence interval for this prediction based on Deming regression
DRegLambda(R1, R2) lambda value for Deming regression calculated from R1 and R2.

Total Least Squares (TLS) Regression

The following are array functions that support TLS regression. 

TRegCoeff0(R1, R2, lab) 2 × 1 column array consisting of the intercept and slope TLS regression coefficients; if lab = TRUE then an extra column of labels is appended (default FALSE)
TRegCoeff(R1, R2, iter) k+1 × 1 column array consisting of the TLS regression coefficients, where k = # of columns in R1 and iter = # of iterations in the SVD algorithm

Passing-Bablok (PB) Regression

The following are array functions that support PB regression where R1 and R2 are column arrays or ranges containing x and y values respectively. If lab = TRUE then an extra column of labels is appended (default FALSE)

PBRegCoeff(R1, R2, lab, alpha) 6 × 1 column array consisting of the slope and intercept PB regression coefficients plus their 1–alpha confidence intervals
PBTEST(R1, R2, lab, alpha) 3 × 1 column array consisting of the h-stat, h-crit, and p-value for the test of the PB regression linearity assumption

Logistic Regression

LogitCoeff(R1, lab, raw, head, alpha, iter) array function that returns logistic regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings.
LogitCoeff2(R1, R2, lab, head, alpha, iter) array function like LogitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis.
LogitCoeffs(R1, iter) array function like LogitCoeff that outputs the coefficients plus the number of iterations made
LogitTest(R1, lab, raw, iter) array function that returns LL, LL0, chi-square, p-value and actual # of iterations used for data in R1
LogitRSquare(R1, lab, raw, iter) array function that returns LL, LL0, pseudo R-square, AIC, BIC  and actual # of iterations for data in R1
LogitPred(R0, R1, raw, iter) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression model calculated from the data in R1 (without headings)
LogitPredC(R0, R2) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression coefficients contained in R2 (in the form of a column vector)
LogitSummary(R1, head) array function that returns a summary of the raw data in range R1.
LogitMatches(R1, R2) array function which outputs the total number of successes and failures for each row in R2 based on the raw data in R1.
LogitSelect(R1, s, head) array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output.
LogitReduce(R1, s) array function that takes summary data in R1 including column headings and fills the highlighted range with the data in R1 omitting the columns described by the string s, where s is a comma delimited list of column headings in R1
HOSMER(R1, lab, raw, iter) array function that returns a Hosmer-Lemeshow table with 10 equal-sized data ranges based on the data in R1 (w/o headings)
HLTEST(R1, lab, raw, iter) array function that returns the Hosmer-Lemeshow statistic and p-value

If lab = TRUE then the output includes headings and if lab = FALSE (the default) only the data is outputted. Except as indicated above, if head = TRUE (default) then R1 and the output include column headings, while if head = FALSE then R1 and the output contain only data.

If raw = TRUE then the data in R1 is in raw form and if raw = FALSE (the default) then the data in R1 is in summary form. The parameter alpha is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter iter determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of head is FALSE.

Logistic Regression Power and Sample Size

LOGIT_POWER(p0, p1, OR, size, Rsq, α) power for binary logistic regression with normally distributed independent variables for the given sample size
LOGIT_SIZE(p0,p1, OR, 1−β, Rsq, α) minimum sample size required to obtain power of at least 1−β for binary logistic regression with normally distributed independent variable

α = significance level (defaults to .05), 1−β = power (defaults to .80) and Rsq = R2 when the other independent variables are regressed on the independent variable of interest (defaults to 0, indicating that there are no other independent variables). p0 = the probability that y = 1 when x = μx, p1 = the probability that y = 1 when x = μx,+σx and OR = the odds ratio; you need to provide a value for either p1 or OR, but not both; if both are provided then only OR will be used.

LOGIT_POWER0(p0, p1, OR, size, π, Rsq, α) power for binary logistic regression with binomially distributed independent variables for the given sample size
LOGIT_SIZE0(p0,p1, OR, 1−β, π, Rsq, α) minimum sample size required to obtain power of at least 1−β for binary logistic regression with binomially distributed independent variable

alpha, pow, Rsq, OR are as for LOGIT_POWER and LOGIT_SIZE. p0 = the probability that y = 1 when x = 0 and p1 = the probability that y = 1 when x = 1; you need to provide a value for either p1 or OR, but not both; if both are provided then only p1 will be used. π = the portion of the sample where x = 1 (defaults to .5).

Probit Regression

ProbitCoeff(R1, lab, raw, head, alpha, iter) array function that returns probit regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings.
ProbitCoeff2(R1, R2, lab, head, alpha, iter) array function like ProbitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis.
ProbitCoeffs(R1, iter) array function like ProbitCoeff that outputs the coefficients plus the number of iterations made
ProbitTest(R1, lab, raw, iter) array function that returns LL, LL0, chi-square and p-value for data in R1
ProbitRSquare(R1, lab, raw, iter) array function that returns LL, LL0, pseudo R-square, AIC, BIC  for data in R1
ProbitPred(R0, R1, raw, iter) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression model calculated from the data in R1 (without headings)
ProbitPredC(R0, R2) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression coefficients contained in R2 (in the form of a column vector)

The arguments lab, raw, iter, head, and alpha are as for the corresponding logistic regression function.

Multinomial Logistic Regression

The following are array functions where R1 is the data used to create the multinomial logistic regression model. When r = 0 (default) then the data in R1 is in raw form, whereas if r ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, r.

MLogitCoeff(R1, r, lab, head, iter) returns the coefficients for data in range R1. 
MLogitParam(R1, r, lab, head, alpha, iter) returns the coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings.
MLogitTest(R1, r, lab, iter) returns LL, LL0, chi-square, p-value, and the actual number of iterations for data in R1
MLogitRSquare(R1, r, lab, iter) returns LL, LL0, pseudo-R-square, AIC, BIC, and the actual number of iterations  for data in R1
MLogitPred(R0, R1,r, iter) returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression model calculated from the data in R1 (without headings)
MLogitPredC(R0, R2) returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2
MLogitSummary(R1, r, head) returns a summary of the raw data in range R1. 
MLogitSelect(R1, s, head) array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma-delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output.
MLogitExtract(R1, r, s, head) fills the highlighted range with the columns defined by string s from the data from R1. The string s takes the form of a comma-delimited list of numbers 0, …, r. 
MLogit_Accuracy(R1, r, lab, head, iter) fills the highlighted range with a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. If R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE).

Here lab, headalpha, and iter are as for the logistic regression functions described above.

Ordinal Logistic Regression

The following functions are used with ordinal logistic regression models. R1 contains the data. For each row in R1, the initial n columns specify values for the independent variables. If r = 0, then R1 is in raw data format with n+1 columns where the last column of R1 contains category numbers (integer values starting with 1); otherwise, R1 is in summary format and contains n+r columns where the last r columns specify the counts for each category 1, 2, …, r. Rc is a column array that contains the ordinal regression coefficients for the data in R1. If no Rc argument is present, then R1 doesn’t contain headings.

OLogitCoeff(R1, r, lab, head, alpha, iter, guess) returns the regression coefficients using Newton’s method with iter iterations for the data in R1. For each coefficient, the output also includes the standard error, Wald statistic, p-value, and 1 – α confidence interval. If head = TRUE then R1 includes column headings.
OLogitCov(R1, Rc, r) returns the covariance matrix corresponding to the regression coefficients in Rc based on the data in R1.
OLogitConverge(R1, Rc, r) returns a column array all of whose values should be close to zero provided Rc provides an adequate estimate of the regression coefficients for the data in R1.
OLogitLL(R1, Rc, r, lab) returns a column array with the values LLLL0, chi-square test results (chi-square stat, df, and p-value), R-square (McFadden, Cox and Snell, Nagelkerke versions), AIC, and BIC. If lab = TRUE (default FALSE) then an extra column is appended to the output which contains labels.
OLogitCorrect(R1, Rc, r) the fraction of observations in R1 for which the ordinal regression model correctly predicts the observed outcome (i.e. where the observed category is the one with the highest probability based on the model).
OLogitPredX(R0, Rc) returns a 1 × r row array listing the probabilities of outcomes 1, …, r, where r = 1 + the number of columns in Rc, for the values of the independent variables contained in the range R0 (in the form of either a row or column vector). If R0 is a 1 × k row vector or k × 1 column vector, then Rc is a (k+1) × (r-1) range.
OLogitPredC(R0, Rc) returns an m × r array corresponding to the m × n array R0. Each row in the output contains the probability for each of the r categories based on values of the n independent variables in the corresponding row of R0 based on the regression coefficients in Rc.
OLogitSummary(R1, head) returns an equivalent array in summary form for the raw data in R1. If head = TRUE then R1 includes column headings as well as the output.

Poisson Regression

PoissonCoeff(Rx, Ry, lab, phi, Rt, head, alpha, iter, guess) array function that returns Poisson regression coefficients and other parameters (s.e., Wald stat, p-value, and confidence interval) for data in range Rx, Ry, and Rt. If head = TRUE, then Rx, Ry, and Rt contain column headings (default FALSE). If phi = TRUE (default FALSE) then the phi correction is applied to the standard errors.
PoissonCov(Rx, Ry, Rt, iter, guess) array function that contains the coefficient covariance matrix
PoissonPred(Rx0, Rx, Ry, lab, Rt0, Rt, alpha, iter, guess) array function that returns a column array containing the predictions for Rx0 and Rt0 based on the Poisson model based on the data in Rx, Ry and Rt
PoissonPredC(Rx0, Rc, Rt) array function that returns a column array with the predictions for Rx0 and Rt0 based on the Poisson coefficients in Rc.
PoissonPredCC(Rx0, Rc, Rv, lab, Rt0, alpha) array function that returns an array with 4 columns containing the predictions, standard errors, and confidence intervals for Rx0 and Rt0 based on the Poisson coefficients in Rc and coefficient covariance array Rv.

Rx contains the X data, Ry contains the Y data and Rt contains the frequency data. If Rt or Rt0 is omitted it defaults to a column array of ones. Rt or Rt0 can also be a numeric value, in which case it is treated as a column array containing this numeric value.

If lab = TRUE, then an extra column is appended to the output containing labels, except for PoissonPredCC where an extra row of labels is appended. alpha is the significance level (default .05). iter is the number of iterations used in calculating the coefficients using Newton’s method (default 20). guess is the initial guess of these coefficients (if missing then all the coefficients are initially set to one).

Ridge and LASSO Regression

The following are array functions where Rx contains x values, Ry is a column range containing y values and Rc is a column range containing coefficients.

RidgeRegCoeff(Rx, Ry, lambda, std array with standard Ridge regression coefficients and their standard errors for the Ridge regression model based on Rx, Ry, and lambda If std = TRUE, then the values in Rx and Ry have already been standardized; if std = FALSE (default) then the values have not been standardized. 
RidgeCoeff(Rx, Ry, lambda) array with unstandardized Ridge regression coefficients and their standard errors for the Ridge regression model based on Rx, Ry, and lambda; the values in Rx and Ry are not standardized.
RidgePred(Rx0, Rx, Ry, lambda) column array of predicted y values for the data in range Rx0 based on the Ridge regression model defined by Rx, Ry, and lambda
RidgeVIF(Rx, lambda) column array with the VIF values using a Ridge regression model based on Rx, Ry, and lambda
RidgeCVError(Rx, Ry, lambda, map column array whose first element contains the k-fold cross-validation error for lambda based on the Ridge regression for the standardized data in Rx and Ry, where the partition is as defined by map, a column array with the same number of rows as Ry containing the values 1, 2, …, k; the other elements in the output are the CV errors for each of the k partition elements.
LASSOCoeff(Rx, Ry, lambda, iter, guess) column array with standardized LASSO regression coefficients based on Rx, Ry, and lambda using the cyclical coordinate descent algorithm with iter iterations (default 10000) and with the initial guesses for each coefficient specified in the column array guess; alternatively guess can specify a single initial value for all the coefficients (default .2)

There are also the following non-array functions:

RidgeRSQ(Rx, Rc, std) R-square value for Ridge regression model based on Rx and Rc; if std= TRUE, then the values in Rx have already been standardized; if std = FALSE (default) then the values have not.
RidgeLambda(Rx, vif, iter) the lowest lambda value for Ridge regression on the x values in Rx that generates a maximum VIF value less than vif; iter = the number of iterations in the search (default 25)
RidgeMSE(Rx, Ry, lambda) MSE of the Ridge regression defined by Rx, Ry, and lambda

Finally, there are the following array functions that define partitions used by RidgeCVError.

RandPart(n,k) column array with n rows with the values 1, 2, …, randomly distributed where the number of times each integer appears is approximately equal.
OrderedPart(n,k) column array with n rows with the values 1, 2, …, repeated as many times as necessary in that order.
SortedPart(Rx, k) column array with the same number of rows as R1 and containing the values 1, 2, …, k where the number of times each integer appears is approximately equal. The order of the values 1, 2, …, k is determined by the sort order in R1. 

Tobit Regression

TobitCoeff(Rx, Ry, iter, prec, Rg) array function that returns an array whose initial rows contain the estimated Tobit regression coefficients and the standard errors. The last two rows contain the estimate for σ and its standard error, LL and the # of iterations.
TobitLL(Rx, Ry, Rc) LL for the Tobit regression coefficients in the column array Rc and (X, y) values in Rx and Ry.
TobitPred(Rx0, Rc) array function that returns an array containing the forecasted censored and uncensored values for the X vectors in Rx0 based on the Tobit regression coefficients in the column array Rc. 

Non-parametric Regression

LOESS(Rx, Ry, Rx0, npts, deg) array function that returns a column array with the fitted values of the LOESS regression defined by Rx, Ry, and npts for the x values in Rx0.
KTSReg(R1, lab, mopt, bopt, alpha) array function that returns a column array with the values: slope, s.e. for the slope, the intercept, and confidence intervals of the slope and intercept for Kendall-Theil-Sens regression

Rx is a column array or range containing x data values and Ry is a column array or range with the corresponding y values.

Rx0 is a column array with any number of x values, and npts = the number of points in the span. If npts is omitted (or set to zero) then npts is reset to (n+deg)/3 or the next larger integer where n = the # of elements in Rx (or Ry). If Rx0 is omitted it defaults to the array values in Rx. Rx0 can also take a scalar x value, in which case the output is the fitted y value corresponding to this x value. deg = 1 (default) for weighted linear regression and deg = 2 for weighted quadratic regression.

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is output. alpha is the significance level (default .05). If mopt = TRUE (default), then SMALL is used to estimate the confidence interval for the slope; otherwise, SMALLExact is used. If bopt = TRUE (default), then the intercept is estimated using Approach 1; otherwise, Approach 2 is used.

Survival Analysis

LOGRANK(R1, R1, lab) array function which returns the following statistics along with their p-value: log-rank 1, log-rank 2, Wilcoxon, Tarone-Ware
COXEST(R1, approx, iter) array function which returns Cox regression coefficients, their standard errors, convergence values, LL1 and LL0 values and covariance matrix
COXPRED(R1, R2, R0, lab, approx, iter, alpha) array function which predicts the hazard ratio between the two subject profiles in R1 and R2 (plus standard error and 1−alpha confidence interval) based on a Cox regression model derived from the input data in  R0.

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted. The approx parameter takes the value: 0 if the continuum approximation is used and 1 (default) if the Breslow approximation is used. The output for COXEST and COXPRED is calculated using Newton’s Method with iter iterations (default = 20).

Analysis of Variance (ANOVA)

The following functions are used for one-factor ANOVA with replication where R1 = the input data in Excel format

One-way ANOVA functions

Here b is an optional argument. When b = TRUE (default) then the columns denote the groups, while when b = FALSE, the rows denote the groups.

If R1 is in standard (i.e. stacked) format, then you can use the following formulas, where the colth column contains the data for one-way ANOVA (default col = 2):

SSWStd(R1, col) = SSW         SSBetStd(R1, col) = SSBet        SSTotStd(R1, col) = SSTot

The following functions are used for two-factor ANOVA where R1 = the input data in Excel format and r = the number of rows in R1 that make up an A factor level.

Two factor ANOVA functions

The second argument for the column and interaction terms is optional and can be dropped. Note that the column and total terms are identical to the between groups and total terms, respectively, for one-factor ANOVA.

ANOVARow(R1,r) = MSA/MSW ATESTRow(R1,r) = p-value of A factor
ANOVACol(R1,r) = MSB/MSW ATESTCol(R1,r) = p-value of B factor
ANOVAInt(R1,r) = MSAB/MSW ATESTInt(R1,r) = p-value of AB factor

If R1 is in standard (i.e. stacked) format, then you can use the following formulas for two-way ANOVA:

SSRowStd(R1) = SSA SSColStd(R1) = SSB SSIntStd(R1) = SSAB
SSWFStd(R1) = SSW SSTotStd(R1, 3) = SST dfWFStd(R1) = dfW
dfRowStd(R1) = dfA dfColStd(R1) = dfB dfIntStd(R1) = dfAB

The following array functions are used to convert data between Excel’s formatting for Anova and standard format:

StdAnova1(R1) converts data in R1 in standard format into Excel Single Factor Anova format
Anova1Std(R1) converts data in R1 in Excel Single Factor Anova format into standard (stacked) format
StdAnova2(R1) converts data in R1 in standard format into Excel Two Factor Anova format
Anova2Std(R1, r) converts data in R1 in Excel Single Factor Anova format with r rows per group into standard format
Anova3Rows(R1) converts data in R1 in standard Three Factor Anova columns format into Three Factor Anova rows format
Anova3Cols(R1) converts data in R1 in standard Three Factor Anova rows format into Three Factor Anova columns format
StdNested(R1) converts data in R1 in standard format into Excel Nested Anova format

The following array functions are used to return a column array with ANOVA residuals:

Anova1ResX(R1) returns residuals for one-way Anova data in R1 in Excel format
Anova1Res(R1) returns residuals for one-way Anova data in R1 in standard (stacked) format
Anova2Res(R1) returns residuals for two-way Anova data in R1 in standard (stacked) format
Anova3Res(R1) returns residuals for three-way Anova data in R1 in standard (stacked) format

The following array functions are used to perform ANOVA via regression:

SSAnova2(R1, r) returns a column array with SSRow, SSCol, SSInt, and SSW for a two-factor ANOVA for the data in R1 using a regression model; if r > 0 then R1 is assumed to be in Excel Anova format (with headings) with r rows per sample, while if r = 0 or is omitted then R1 is assumed to be in standard format (w/o headings)
SSAnova3(R1) returns a column array with SSA, SSB, SSC, SSAB, SSAC, SSBC, SSABC, and SSW for a three-factor ANOVA for the data in R1 using a regression model where the data in R1 is assumed to be in standard format by columns w/o column headings

Trend Analysis and orthogonal polynomial contrasts

TREND_COEFF(R1, lambda) returns the coefficient matrix for a treatment with k equally spaced groups for k ≥ 2. If lambda = TRUE (default FALSE), then a column of lambda values is appended to the output.
TREND_ANOVA(R1, R2, sse, dfe, sst, lab) returns the ANOVA analysis using polynomial coefficients based on the group means in R1 and group counts in R2. sse is the error SS, dfe is the error df, and optionally sst, which is the treatment SS. If lab = TRUE (default FALSE), then row/column headings are appended to the output.

ART ANOVA Functions

Std2Art(R1, head) converts data in R1 in standard two-factor ANOVA format to an array with five columns, the first two columns are identical to the first two columns of R1 and the other columns are the ART ranks for the Row, Col, and Int factors
Std3Art(R1, head) converts data in R1 in standard three-factor ANOVA format to an array with ten columns, the first three columns are identical to the first three columns of R1 and the other columns are the ART ranks for the A, B, C, AB, AC, BC and ABC factors

If head = TRUE (default FALSE) then both R1 and the output contain column headings.

ANOVA-related Functions

LEVENE(R1, type) p-value of Levene’s test for the data in R1 (organized by columns) where type = 0 for deviations from group means, type = 1 for deviations from group medians and type = -1 for deviations from 10% trimmed group means
OBrienTest(R1) p-value of O’Brien’s test for data in R1 (organized by columns)
FKTEST(R1) p-value of Fligner-Killeen test for data in R1 (organized by columns)
BARTLETT(R1) p-value of Bartlett’s test for data in R1 (organized by columns)
CONTEST(R1) p-value of Conover’s squared ranks test for data in R1 (organized by columns)
DunnSidak(α, k) 1−(1−α)1/k
TUKEY(R1) array with 3 columns and as many rows as there are pairwise comparisons. The first two columns contain the column numbers in R1 (from 1 to n) that are being compared and the third column contains the p-values for each of the pairwise comparisons.
ADJK(nrows, ncols) adjusted k value for Tukey HSD after two-factor ANOVA
ADDITIVITY_ ANOVA(R1) p-value of Turkey’s Additivity Test for the data in R1 in Excel format

Randomized Complete Block Design (RCBD)

RCBDMissing(R1) returns an array identical to R1 except that when one cell is non-numeric (representing a missing data value), then that cell is replaced by an imputed value
RCBDAdjSS(R1, b) adjusted SS value for rows (i.e. Blocks) if b = TRUE or for columns (i.e. Groups) if b = FALSE
SS_RCBD(R1, std) returns a column array with the values SSBlock, SSGroups, and SSError for the data in range R1 based on regression; if std = TRUE then R1 is assumed to be in stacked format and when std = FALSE (default) R1 is assumed to be in Excel format

Sphericity

GGEpsilon(R1, ngroups, raw) Greenhouse and Geisser epsilon value for the data in range R1 where ngroups = the number of groups; if raw = TRUE then R1 contains raw data, otherwise it contains a covariance matrix
HFEpsilon(R1, ngroups, nsubj) Huynh and Feldt epsilon value for the data in range R1 where ngroups = the number of groups; if nsubj = 0 then R1 contains raw data, otherwise it contains a covariance matrix which is derived from raw data with nsubj subjects (corresponding to rows).
GG_Epsilon(R1) Greenhouse and Geisser epsilon value for the data in range R1 which includes a column of labels
HF_Epsilon(R1) Huynh and Feldt epsilon value for the data in range R1 which includes a column of labels
MauchlyTest(R1) p-value of Mauchly’s test for sphericity on the data in range R1
JNSTest(R1) p-value of the John-Nagao-Sugiura test for sphericity on the data in range R1

Non-parametric Tests

KRUSKAL(R1, ties) Kruskal-Wallis test statistic for the data in R1
KTEST(R1, ties) p-value for Kruskal-Wallis Test for the data in R1
FRIEDMAN(R1, ties, chi) Friedman test statistic for the data in R1
FrTEST(R1, ties, chi) p-value for Friedman’s Test for the data in R1
MOODS_STAT(R1) chi-square statistic for Mood’s Median Test for the data in R1
MOODS_TEST(R1) p-value for Mood’s Median Test for the data in R1
COCHRAN(R1, raw, cont) Cochran’s Q statistic for the data in R1
QTEST(R1, raw, cont) p-value for Cochran’s Q Test for the data in R1
FSTAR(R1) Brown-Forsythe’s test statistic F* on the data in range R1
DFSTAR(R1) df* for Brown-Forsythe’s test on the data in range R1

If ties = TRUE (default), then a ties correction is applied. If chi = TRUE (default), the chi-square version of the test is used; otherwise, an F test is used.

There are also the following array functions about non-parametric ANOVA-like tests:

WELCH_TEST(R1, lab) returns a column array: F, df1, df2, and p-value for Welch’s test for the data in range R1
FSTAR_TEST(R1, lab) returns a column array: F, df1, df2, and p-value for Brown-Forsythe’s test for the data in range R1
KW_TEST(R1, lab, ties) returns a column array: H, H-ties, df, and p-value for  the Kruskal-Wallis test for the data in range R1, if ties = TRUE (default) a ties correction is applied
JT_TEST(R1, lab) returns a column array: JT statistic, z-score, and p-value for the Jonckheere–Terpstre test for the data in range R1.
PAGE_TEST(R1, lab, tails, lookup) returns a column array: Page’s L-stat, z, p-value, and rho.

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted. If lookup = TRUE (default) use table lookup instead of z-statistic for p-value.

2k Factorial Design

DESIGN2k(klab, d) returns the design matrix for a 2^k factorial design
ExpandDesign2k(R1, d) returns the design matrix for a 2^k factorial design augmented by R1
Effect2k(R1, R2, lab, d) returns a column array consisting of the effect size values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values
SS2k(R1, R2, lab, d) returns a column array consisting of SS values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values

If lab = TRUE then the output includes a column of labels (although for DESIGN2k this is a row of labels), while if lab = FALSE (the default) only the data is outputted.  Output includes d-way interactions (d = 0, 2, 3; default = 2).

Taguchi DOE

TInteract(i, j, levels) returns a row array with the interaction column(s) for factors in columns i and j
TOptimize(R1, R2, optcode, sn, opt_only) returns an array with the optimal combination of factors/levels if opt_only = TRUE (default) and all combinations otherwise

ANOVA Power and Sample Size

ANOVA1_POWER(f, n, k, type, α, m, prec) power of a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), f = RMSSE effect size if type = 2 and f = noncentrality parameter if type = 0
ANOVA1_SIZE(f, k, 1−β, type, α, m, prec) minimum sample size required to obtain power of at least 1−β for a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), and f = RMSSE effect size if type = 2
NCP_ANOVA(R1, R2, v) non-centrality parameter of a one-way ANOVA for group means specified in the column array R1 where the number of replications for each group is specified in column array R2 and the variance for any of the groups is v.
TUKEY_POWER(R1, n, iter, alpha, clock) an array with 3 columns whose first row corresponds to the ANOVA test and the other rows correspond to all possible pairwise group comparisons. Except for the first row, the first two columns contain the group numbers, from 1 to # of columns in R1, that are being compared and the third column contains the power for each of the pairwise comparisons. The first row contains the label “anova” and the power of the ANOVA test.

For TUKEY_POWER, R1 is a column array with the effect sizes of each of the treatment groups in a one-way ANOVA. n = the number of elements in each group; iter = the # of simulations (default 100); alpha is the significance level (default .05). If clock = TRUE (default FALSE) then the first element in the output represents the time elapsed to create the output in seconds.

Intraclass Correlation

ICC(R1, class, type, lab, α) array function which outputs the intraclass correlation coefficient ICC(class, type) plus the lower and upper bound of the 1−α confidence interval for the data in R1; default values are class = 2, type = 1, α = .05. If lab = TRUE, then an extra column of labels
ICC_POWER(ρ0, ρ1, n, k, α) power of ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items
ICC_SIZE(ρ0, ρ1, k, 1−β, α) minimum sample size required to obtain power of at least 1−β for ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items

Distribution Functions

QDIST(q, k, df) studentized q cumulative distribution value for q with k independent variables and df degrees of freedom
QINV(p, k, df, tails) inverse of the studentized q distribution, i.e. the critical value for the studentized q range; tails = 1 or 2 (default)

Table Lookup

QCRIT(k, df, α, tails, h) critical value in the Studentized Range Q table
DCRIT(k, df, α, tails, h) critical value in Dunnett’s test table
DLowerCRIT(n, k, α, h) lower critical value in the Durbin-Watson Table 
DUpperCRIT(n, k, α, h) upper critical value in the Durbin-Watson Table 

 If h = TRUE (default), then harmonic interpolation is used; otherwise, linear interpolation is used.

Matching Functions

PSM_Matching(R1, tindex, cutoff)

  • R1 is an array of probabilities (i.e. the output from logistic regression). tindex is the number of the first row in R0 that came from data in the treatment group.
  • All elements in the treatment part of R1 (i.e. rows starting with tindex) are matched with the nearest element in the control part of R1 (i.e. rows before tindex), provided that distance is not greater than cutoff. It then returns a column array of zeros and ones with the same number of rows as R1 where 1 = matched and 0 = no match.

CEM_Coding(R1, R2)

  • R1 is an array whose first column contains zeros  and ones where 1 corresponds to the treatment group and 0 corresponds to the control group. The other columns contains sample data for one confounding variable. The data has been sorted so that the rows corresponding to the control group occur before the rows containing data from the treatment group. R2 is an array that contains codes that fix the bins.
  • The output contains the coding for all the values in R1, excluding the first column, based on the coarsening codes in R2.

RowMatch(R1, R2)

  • R1 is a numeric array (for CEM, this is the array of bin signatures). R2 is a numeric row array (for CEM this is a potential bin signature).
  • The output is the number of the row in R1 that matches R2; or zero if there is no match.

CEM_Pairing(R1, R2)

  • R1 is a column array containing the bin signature numbers for the rows of coarsened data and R2 is an array with two columns, where the first column contains the # of treatment data samples for the corresponding bin signature and the second column contains the # of control data samples for the corresponding bin signature.
  • The output is a column array with zeros and ones where 0 corresponds to dropping this sample from the matched samples and 1 corresponds to keeping this sample.

Pruning(R1, R2, b)

  • R1 is an array containing data (for confounding variables and the outcome variable) and R2 is a column array consisting of either zeros (prune) and ones (retain), or zeros (prune) and weights.
  • The output consists of the data in R1 where any row with a corresponding value of zero in R2 is deleted. If b = TRUE (default FALSE) then the non-zero values (presumably weights) in R2 are appended to the output.

MatchQuality(R1, b)

  • R1 is an array containing data for the treatment variable (with values 0 or 1), all confounding variables and the outcome variable. The first column of R1 contains the data for the treatment variable.
  • The output consists of statistics for the treatment and control groups, one column for each of the variables in R1 except for the treatment variable. The statistics reported are the mean, variance, min, 25th percentile, median, 75th percentile, and max.
  • In addition, if b = TRUE (default), the following four statistics are reported comparing the treatment and control values for each of the variables in R1 (except for the treatment variable): p-value of Shapiro-Wilk test for normality, p-value of a two independent sample t-test, p-value of a Mann-Whitney test, and p-value of a Brunner-Munzel test. If b = FALSE, then the last two rows are not output.

Categorical Coding

CATCODE(R1) array function which fills the highlighted array with simple coding of values in range R1; if R1 is an m × n range, highlight an m × n range
TAGCODE (R1, code, last) array function that returns an m × (k–1) array with dummy coding of values in column array R1 where R1 has m rows and k unique values; if code = TRUE (default) 0,1 coding is used; otherwise, 0,1,-1 is used. If last = TRUE (default) then the last new element in R1 is the reference category; otherwise, the first element in R1 is the reference category.

Iterative Proportional Fitting Procedure

IPFP2(R1) array function which fills the highlighted array with the output from the IPF procedure on the two-contingency table with targeted marginal totals in R1; if R1 is an m × n range, highlight an m-1 × n-1 range
IPFP3 (R1, R2) array function which fills the highlighted array with the output from the IPF procedure on the three-contingency table in R1 with targets in R2. If R1 is an m × n range, highlight an m × n range.

Leave a Comment