The following is a summary of the Regression and ANOVA worksheet functions provided in the Real Statistics Resource Pack.
These functions are organized into the following categories:
- Linear Regression
- Heteroskedasticity
- Autocorrelation
- Regression Power and Sample Size
- Stepwise Regression
- Exponential Regression
- Polynomial Regression
- Least Absolution Deviation (LAD) Regression
- Deming Regression
- Total Least Squares Regression
- Passing-Bablok Regression
- Logistic Regression
- Probit Regression
- Multinomial Logistic Regression
- Ordinal Logistic Regression
- Poisson Regression
- Ridge and LASSO Regression
- Tobit Regression
- Non-parametric Regression
- Survival Analysis
- ANOVA
- Trend Analysis and Orthogonal Polynomial Contrasts
- ART ANOVA
- ANOVA-related
- Randomized Complete Block Design (RCBD)
- Sphericity
- Non-parametric tests
- 2k Factorial Design
- Taguchi DOE
- ANOVA Sample Size
- Interclass Correlation
- Distribution Functions
- Categorical Coding
- Iterative Proportional Fitting Procedure
Linear Regression
The following are ordinary, non-array functions where R1 contains the X data and R2 contains the Y data:
A second R2 parameter can be used with each of the df functions above, although this parameter is not used. Similarly, you can use SSRegTot(R1, R2) and its value will be equivalent to SSRegTot(R2). All these functions can optionally take a third argument con, where con = TRUE (default) means that the regression model takes a constant term, and con = FALSE means that the regression model doesn’t have a constant term.
Click here for more details about these functions.
There is also a second form of the RSquare function in which RSquare(R1, k) = R2 where the X data consists of all the columns in R1 except the kth column, and the Y data consist of the kth column of R1.
The following are array functions where R1 contains the X data and R2 contains the Y data.
DESIGN(R1) | design matrix for the data in R1 |
HAT(R1, con) | hat matrix for the data in R1 |
DIAGHAT(R1, con) | diagonal of the hat matrix for the data in R1 |
CORE(R1) | core of the hat matrix for the data in R1 |
LEVERAGE(R1, con) | leverage vector = diagonal of hat matrix for the data in R1 |
RegCov(R1, R2) | covariance matrix for the regression coefficients of the regression line |
RegCoeff(R1, R2, con) | two-column array with the regression coefficients for the regression line in the first column and the corresponding standard errors in the second column |
RegPred(R0, R1, R2, lab, alpha) | 7 × 1 column array containing the predicted y value for the data in R0, the standard error for the confidence interval, the lower and upper ends of the 1–alpha confidence interval, the standard error for the prediction interval, the lower and upper ends of the 1–alpha prediction interval (alpha defaults to .05) |
RegPredCC(R0, Rc) | predicted y values for x values in range R0 based on the regression coefficients in column range Rc; handles case with and w/o an intercept |
StdRegCoeff(R1, R2, Rc, ystd) | column array with the standardized regression coefficients for R1 and R2 based on the regression coefficients in the column range Rc (including an intercept coefficient). If ystd = TRUE (default), then standardized regression coefficients are based on the y data being standardized. If Rc is omitted, its values are calculated from R1 and R2 using ordinary least squares. |
UnStdRegCoeff(R1, R2, Rc, ystd) | column array with the unstandardized regression coefficients for R1 and R2 based on the standardized regression coefficients in the column range Rc (including an intercept coefficient). If ystd = TRUE (default), then the standardized regression coefficients are based on the y data being standardized |
RRegCoeff(R1, R2, hc, con) | two-column array with the regression coefficients for the regression line in the first column and the corresponding robust standard errors in the second column, where hc = 0 through 4 corresponding to HC0 through HC4 |
WRegCoeff(R1, R2, R3) | two-column array with the regression coefficients for the weighted regression line in the first column and the corresponding standard errors in the second column, where R3 contains the weights |
RegCoeffSE(R1, R2) | vector with the standard errors of the coefficients for the regression line |
RegY(R1, R2, con) | vector of predicted values for Y based on the regression line = TREND(R2,R1) |
RegE(R1, R2, con) | vector of residuals based on the regression line |
RegStudE(R1, R2) | vector of studentized residuals based on the regression line |
RSquare_Test(R1, R2, Ry, lab) | column array with values R1-square, R2-square, F, df1, df2 and the p-value of the test of the significance of X data in R2 (reduced model) vs. X data in R1 (full model) where Ry contains the Y data. |
SHAPLEY(R1, R2) | vector with the Shapley-Owen decomposition of R2 |
SlopesTest(R1, R2, R3, R4, b, lab) | vector containing s.e. of differences between slopes, t, df and p-value where R3 and R4 are the X and Y values for a second regression line; if b = TRUE (default) the pooled s.e. is used. |
If lab = TRUE then a column of labels is added to the output (default = FALSE)
The following are non-array functions where R1 contains the X data and R2 contains the Y data.
RegPredC(R0, Rc) | predicted y values for x values in R0 based on the regression coefficients in Rc. R0 and Rc can be column or rows arrays or ranges. |
RSquareTest(R1, R2, Ry) | p-value of the test of the significance of X data in R2 (reduced model) vs. X data in R1 (full model); Ry contains the Y data. |
RegAIC(R1, R2, con) | Akaike’s Information Criterion (AIC) for the regression model |
RegAICc(R1, R2, con) | corrected AICc for the regression model |
RegSBC(R1, R2, con) | Schwarz Baysean Criterion (SBC) for the regression model |
TOLERANCE(R1, j) | Tolerance of the jth variable for the data in range R1 |
VIF(R1, j) | VIF of the jth variable for the data in range R1 |
WVIF(R1, R2, j) | VIF of the jth variable for the data in R1 using the weights in R2 |
Note that if the last parameter in VIF or WVIF is omitted then a column array with the VIF values for all the independent variables is returned.
Heteroskedasticity
In the following R1 is an array of X values, R2 is a column array of Y values. If chi = TRUE, (default) the chi-square version of the test is used; otherwise, the F-test is used.
BPagStat(R1, R2, chi) | Breusch-Pagan test statistic |
BPagTest(R1, R2, chi) | p-value for Breusch-Pagan test |
WhiteStat(R1, R2, chi) | White test statistic |
WhiteTest(R1, R2, chi) | p-value for the White test statistic |
Autocorrelation
CO_RHO(R1, R2, iter, prec) | the value of rho calculated using the Cochrane-Orcutt method based on iter (default 1000) iterations unless the change in the rho value is less than prec (default .0001), at which point the process stops. |
BGSTAT(R1, R2, p, chi) | Breusch-Godfrey statistic for the x data in R1 and y data in R2 based on p-order autocorrelation. If chi = TRUE (default) use chi-square test; otherwise use F test |
BGTEST(R1, R2, p, chi) | p-value for the Breusch-Godfrey test for p-order autocorrelation based on the x data in R1 and y data in R2. If chi = TRUE (default) use chi-square test; otherwise use F test |
DURBIN(R1) | Durbin-Watson statistic d where R1 is a column vector containing residuals |
DURBIN(R1, R2) | Durbin-Watson statistic d where R1 contains X data and R2 contains Y data |
The following array functions are also supported where, except for the first function, R1 contains X data and R2 contains Y data.
DURBIN(R1, lab, alpha) | returns the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 is a column vector containing residuals |
DURBIN(R1, R2, lab, alpha) | returns the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance. |
COCoeff(R1, R2, rho) | returns an array where column 1 contains the regression coefficients based on the rho and column 2 contains the corresponding standard errors. If rho is omitted then rho defaults to 1 – d/2 where d = the Durbin-Watson statistic |
CO_Coeff(R1, R2, iter, prec) | returns COCoeff(R1, R2, CO_RHO(R1, R2, iter, prec)) |
HACCoeff(R1, R2, lags) | returns an array whose first column are the OLS regression coefficients and whose second column are the corresponding Newey-West HAC standard errors, where lags is the # of lags. |
If lab = FALSE (default) then the output is a 4 × 1 column array, while if lab = TRUE then the output is a 4 × 2 array with an extra column of labels.
Regression Power and Sample Size
REG_POWER(e, n, k, type, α, m, prec) | power of multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default), e = R2 effect size if type = 2 and e = noncentrality parameter if type = 0 |
REG_SIZE(e, k, 1−β, type, α, m, prec) | minimum sample size required to obtain power of at least 1−β for multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default) and e = R2 effect size if type = 2 |
n = the sample size, tails = # of tails: 1 or 2 (default), α = significance level (default = .05) and m and prec as for the noncentral distribution functions.
Stepwise Regression
The following array functions are used to create a stepwise regression model. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values and R3 is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise. If cons = TRUE (default) then regression with a constant term is used; otherwise regression through the origin is employed
RegRank(R1, R2, R3, cons) | returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by R1, R2, R3 |
RegCoeffP(R1, R2, R3, cons) | returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by R1, R2, R3 |
RegStepwise(R1, R2, alpha, cons) | returns a 1 × k array, each of whose non-blank element corresponds to an x variable that should be retained in the stepwise regression model. Actually, the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model. alpha = significance level (default .15) |
Exponential Regression
The following are array functions that support a nonlinear exponential regression model:
ExpCoeff(R1, R2, iter, lab) | outputs a 2 × 4 array whose first column contains the coefficients α and β for the regression, whose 2nd column contains the corresponding standard errors for these coefficients, whose 3rd column contains SSE and MSE, and whose 4th column contains MSReg and dfT. If lab = TRUE then an extra row is added with labels (default = FALSE). |
ExpPred(R, R1, R2, iter) | outputs an m × 1 column array with the values predicted by the exponential model for R1 and R2 based on the data in the m × 1 column vector of x values. |
ExpPredC(R, α, β) | outputs an m × 1 column array with the values predicted by the exponential model with coefficients α and β based on the data in the m × 1 column vector R of x values. |
Here iter = the number of iterations (default 20). The last two functions can also be used as non-array functions of the following form:
ExpPred(x, R1, R2, iter) | value predicted by the exponential model for x based on the data in R1 and R2 |
ExpPredC(x, α, β) | value predicted by an exponential model with coefficients α and β for x |
Polynomial Regression
The following are functions that support polynomial regression. The first two functions are array functions. R1 and R2 are column arrays containing x and y data values respectively and deg is the degree/order of the polynomial.
PolyDesign(R1, deg, ones) | returns an array consisting of x, x2, …, xdeg columns. If ones = TRUE, then the output is 1, x, x2, …, xdeg |
PolyCoeff(R1, R2, deg) | returns a two-column array consisting of the polynomial regression coefficients and their standard errors |
PolyRSquare(R1, R2, deg) | R-square value for the polynomial regression |
PolyDegree(R1, R2, deg) | the highest degree polynomial ≤ deg which produces a significantly different R-square value |
Least Absolute Deviation (LAD) Regression
The following are array functions that support LAD regression. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values, con takes the value TRUE if the regression includes a constant term and iter is the number of iterations used in the iteratively reweighted least-squares algorithm (default = 25).
LADRegCoeff(R1, R2, con, iter) | column array containing the LAD regression coefficients, k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE |
LADRegWeights(R1, R2, con, iter) | n × 1 column array consisting of the weights calculated from the iteratively reweighted least squares algorithm |
LADRegCoeffSE(R1, R2, con, iter, nboots) | column array consisting of the standard errors of the LAD regression coefficients based on bootstrapping nboots times; k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE |
Deming Regression
The following are functions that support Deming regression, all but the last function are array functions. R1 is an array containing x data values, R2 is an array containing y data values and λ is the lambda value (in which case R1 and R2 contain one column) or omitted (in which case lambda is calculated from R1 and R2). If lab = TRUE (default is FALSE) then the output contains an extra column of labels, except for DRegResiduals where lab = TRUE means that the output contains an extra row of labels.
DRegCoeff(R1, R2, λ, lab) | 2 × 2 array containing the Deming regression intercept and slope coefficients along with the standard errors of these coefficients |
DRegResiduals(R1, R2, λ, lab) | n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual, and optimized residual for each pair of data elements in R1 and R2 where n = the number of rows in R1 (or R2). |
DRegIdentity(R1, R2, λ, lab) | 2 × 1 array consisting of x̄–ȳ and se(x̄–ȳ) for Deming regression |
DRegPred(xo, R1, R2, λ, alpha, lab) | 4 × 1 array consisting of the predicted value of y for x0, the standard error of the prediction, and the confidence interval for this prediction based on Deming regression |
DRegLambda(R1, R2) | lambda value for Deming regression calculated from R1 and R2. |
Total Least Squares (TLS) Regression
The following are array functions that support TLS regression.
TRegCoeff0(R1, R2, lab) | 2 × 1 column array consisting of the intercept and slope TLS regression coefficients; if lab = TRUE then an extra column of labels is appended (default FALSE) |
TRegCoeff(R1, R2, iter) | k+1 × 1 column array consisting of the TLS regression coefficients, where k = # of columns in R1 and iter = # of iterations in the SVD algorithm |
Passing-Bablok (PB) Regression
The following are array functions that support PB regression where R1 and R2 are column arrays or ranges containing x and y values respectively. If lab = TRUE then an extra column of labels is appended (default FALSE)
PBRegCoeff(R1, R2, lab, alpha) | 6 × 1 column array consisting of the slope and intercept PB regression coefficients plus their 1–alpha confidence intervals |
PBTEST(R1, R2, lab, alpha) | 3 × 1 column array consisting of the h-stat, h-crit, and p-value for the test of the PB regression linearity assumption |
Logistic Regression
LogitCoeff(R1, lab, raw, head, alpha, iter) | array function that returns logistic regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |
LogitCoeff2(R1, R2, lab, head, alpha, iter) | array function like LogitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis. |
LogitCoeffs(R1, iter) | array function like LogitCoeff that outputs the coefficients plus the number of iterations made |
LogitTest(R1, lab, raw, iter) | array function that returns LL, LL0, chi-square, p-value and actual # of iterations used for data in R1 |
LogitRSquare(R1, lab, raw, iter) | array function that returns LL, LL0, pseudo R-square, AIC, BIC and actual # of iterations for data in R1 |
LogitPred(R0, R1, raw, iter) | outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression model calculated from the data in R1 (without headings) |
LogitPredC(R0, R2) | outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression coefficients contained in R2 (in the form of a column vector) |
LogitSummary(R1, head) | array function that returns a summary of the raw data in range R1. |
LogitMatches(R1, R2) | array function which outputs the total number of successes and failures for each row in R2 based on the raw data in R1. |
LogitSelect(R1, s, head) | array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output. |
LogitReduce(R1, s) | array function that takes summary data in R1 including column headings and fills the highlighted range with the data in R1 omitting the columns described by the string s, where s is a comma delimited list of column headings in R1 |
HOSMER(R1, lab, raw, iter) | array function that returns a Hosmer-Lemeshow table with 10 equal-sized data ranges based on the data in R1 (w/o headings) |
HLTEST(R1, lab, raw, iter) | array function that returns the Hosmer-Lemeshow statistic and p-value |
If lab = TRUE then the output includes headings and if lab = FALSE (the default) only the data is outputted. Except as indicated above, if head = TRUE (default) then R1 and the output include column headings, while if head = FALSE then R1 and the output contain only data.
If raw = TRUE then the data in R1 is in raw form and if raw = FALSE (the default) then the data in R1 is in summary form. The parameter alpha is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter iter determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of head is FALSE.
Logistic Regression Power and Sample Size
LOGIT_POWER(p0, p1, OR, size, Rsq, α) | power for binary logistic regression with normally distributed independent variables for the given sample size |
LOGIT_SIZE(p0,p1, OR, 1−β, Rsq, α) | minimum sample size required to obtain power of at least 1−β for binary logistic regression with normally distributed independent variable |
α = significance level (defaults to .05), 1−β = power (defaults to .80) and Rsq = R2 when the other independent variables are regressed on the independent variable of interest (defaults to 0, indicating that there are no other independent variables). p0 = the probability that y = 1 when x = μx, p1 = the probability that y = 1 when x = μx,+σx and OR = the odds ratio; you need to provide a value for either p1 or OR, but not both; if both are provided then only OR will be used.
LOGIT_POWER0(p0, p1, OR, size, π, Rsq, α) | power for binary logistic regression with binomially distributed independent variables for the given sample size |
LOGIT_SIZE0(p0,p1, OR, 1−β, π, Rsq, α) | minimum sample size required to obtain power of at least 1−β for binary logistic regression with binomially distributed independent variable |
alpha, pow, Rsq, OR are as for LOGIT_POWER and LOGIT_SIZE. p0 = the probability that y = 1 when x = 0 and p1 = the probability that y = 1 when x = 1; you need to provide a value for either p1 or OR, but not both; if both are provided then only p1 will be used. π = the portion of the sample where x = 1 (defaults to .5).
Probit Regression
ProbitCoeff(R1, lab, raw, head, alpha, iter) | array function that returns probit regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |
ProbitCoeff2(R1, R2, lab, head, alpha, iter) | array function like ProbitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis. |
ProbitCoeffs(R1, iter) | array function like ProbitCoeff that outputs the coefficients plus the number of iterations made |
ProbitTest(R1, lab, raw, iter) | array function that returns LL, LL0, chi-square and p-value for data in R1 |
ProbitRSquare(R1, lab, raw, iter) | array function that returns LL, LL0, pseudo R-square, AIC, BIC for data in R1 |
ProbitPred(R0, R1, raw, iter) | outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression model calculated from the data in R1 (without headings) |
ProbitPredC(R0, R2) | outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression coefficients contained in R2 (in the form of a column vector) |
The arguments lab, raw, iter, head, and alpha are as for the corresponding logistic regression function.
Multinomial Logistic Regression
The following are array functions where R1 is the data used to create the multinomial logistic regression model. When r = 0 (default) then the data in R1 is in raw form, whereas if r ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, r.
MLogitCoeff(R1, r, lab, head, iter) | returns the coefficients for data in range R1. |
MLogitParam(R1, r, lab, head, alpha, iter) | returns the coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |
MLogitTest(R1, r, lab, iter) | returns LL, LL0, chi-square, p-value, and the actual number of iterations for data in R1 |
MLogitRSquare(R1, r, lab, iter) | returns LL, LL0, pseudo-R-square, AIC, BIC, and the actual number of iterations for data in R1 |
MLogitPred(R0, R1,r, iter) | returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression model calculated from the data in R1 (without headings) |
MLogitPredC(R0, R2) | returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2 |
MLogitSummary(R1, r, head) | returns a summary of the raw data in range R1. |
MLogitSelect(R1, s, head) | array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma-delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output. |
MLogitExtract(R1, r, s, head) | fills the highlighted range with the columns defined by string s from the data from R1. The string s takes the form of a comma-delimited list of numbers 0, …, r. |
MLogit_Accuracy(R1, r, lab, head, iter) | fills the highlighted range with a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. If R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE). |
Here lab, head, alpha, and iter are as for the logistic regression functions described above.
Ordinal Logistic Regression
The following functions are used with ordinal logistic regression models. R1 contains the data. For each row in R1, the initial n columns specify values for the independent variables. If r = 0, then R1 is in raw data format with n+1 columns where the last column of R1 contains category numbers (integer values starting with 1); otherwise, R1 is in summary format and contains n+r columns where the last r columns specify the counts for each category 1, 2, …, r. Rc is a column array that contains the ordinal regression coefficients for the data in R1. If no Rc argument is present, then R1 doesn’t contain headings.
OLogitCoeff(R1, r, lab, head, alpha, iter, guess) | returns the regression coefficients using Newton’s method with iter iterations for the data in R1. For each coefficient, the output also includes the standard error, Wald statistic, p-value, and 1 – α confidence interval. If head = TRUE then R1 includes column headings. |
OLogitCov(R1, Rc, r) | returns the covariance matrix corresponding to the regression coefficients in Rc based on the data in R1. |
OLogitConverge(R1, Rc, r) | returns a column array all of whose values should be close to zero provided Rc provides an adequate estimate of the regression coefficients for the data in R1. |
OLogitLL(R1, Rc, r, lab) | returns a column array with the values LL, LL0, chi-square test results (chi-square stat, df, and p-value), R-square (McFadden, Cox and Snell, Nagelkerke versions), AIC, and BIC. If lab = TRUE (default FALSE) then an extra column is appended to the output which contains labels. |
OLogitCorrect(R1, Rc, r) | the fraction of observations in R1 for which the ordinal regression model correctly predicts the observed outcome (i.e. where the observed category is the one with the highest probability based on the model). |
OLogitPredX(R0, Rc) | returns a 1 × r row array listing the probabilities of outcomes 1, …, r, where r = 1 + the number of columns in Rc, for the values of the independent variables contained in the range R0 (in the form of either a row or column vector). If R0 is a 1 × k row vector or k × 1 column vector, then Rc is a (k+1) × (r-1) range. |
OLogitPredC(R0, Rc) | returns an m × r array corresponding to the m × n array R0. Each row in the output contains the probability for each of the r categories based on values of the n independent variables in the corresponding row of R0 based on the regression coefficients in Rc. |
OLogitSummary(R1, head) | returns an equivalent array in summary form for the raw data in R1. If head = TRUE then R1 includes column headings as well as the output. |
Poisson Regression
PoissonCoeff(Rx, Ry, lab, phi, Rt, head, alpha, iter, guess) | array function that returns Poisson regression coefficients and other parameters (s.e., Wald stat, p-value, and confidence interval) for data in range Rx, Ry, and Rt. If head = TRUE, then Rx, Ry, and Rt contain column headings (default FALSE). If phi = TRUE (default FALSE) then the phi correction is applied to the standard errors. |
PoissonCov(Rx, Ry, Rt, iter, guess) | array function that contains the coefficient covariance matrix |
PoissonPred(Rx0, Rx, Ry, lab, Rt0, Rt, alpha, iter, guess) | array function that returns a column array containing the predictions for Rx0 and Rt0 based on the Poisson model based on the data in Rx, Ry and Rt |
PoissonPredC(Rx0, Rc, Rt) | array function that returns a column array with the predictions for Rx0 and Rt0 based on the Poisson coefficients in Rc. |
PoissonPredCC(Rx0, Rc, Rv, lab, Rt0, alpha) | array function that returns an array with 4 columns containing the predictions, standard errors, and confidence intervals for Rx0 and Rt0 based on the Poisson coefficients in Rc and coefficient covariance array Rv. |
Rx contains the X data, Ry contains the Y data and Rt contains the frequency data. If Rt or Rt0 is omitted it defaults to a column array of ones. Rt or Rt0 can also be a numeric value, in which case it is treated as a column array containing this numeric value.
If lab = TRUE, then an extra column is appended to the output containing labels, except for PoissonPredCC where an extra row of labels is appended. alpha is the significance level (default .05). iter is the number of iterations used in calculating the coefficients using Newton’s method (default 20). guess is the initial guess of these coefficients (if missing then all the coefficients are initially set to one).
Ridge and LASSO Regression
The following are array functions where Rx contains x values, Ry is a column range containing y values and Rc is a column range containing coefficients.
RidgeRegCoeff(Rx, Ry, lambda, std) | array with standard Ridge regression coefficients and their standard errors for the Ridge regression model based on Rx, Ry, and lambda If std = TRUE, then the values in Rx and Ry have already been standardized; if std = FALSE (default) then the values have not been standardized. |
RidgeCoeff(Rx, Ry, lambda) | array with unstandardized Ridge regression coefficients and their standard errors for the Ridge regression model based on Rx, Ry, and lambda; the values in Rx and Ry are not standardized. |
RidgePred(Rx0, Rx, Ry, lambda) | column array of predicted y values for the x data in range Rx0 based on the Ridge regression model defined by Rx, Ry, and lambda |
RidgeVIF(Rx, lambda) | column array with the VIF values using a Ridge regression model based on Rx, Ry, and lambda |
RidgeCVError(Rx, Ry, lambda, map) | column array whose first element contains the k-fold cross-validation error for lambda based on the Ridge regression for the standardized data in Rx and Ry, where the partition is as defined by map, a column array with the same number of rows as Ry containing the values 1, 2, …, k; the other elements in the output are the CV errors for each of the k partition elements. |
LASSOCoeff(Rx, Ry, lambda, iter, guess) | column array with standardized LASSO regression coefficients based on Rx, Ry, and lambda using the cyclical coordinate descent algorithm with iter iterations (default 10000) and with the initial guesses for each coefficient specified in the column array guess; alternatively guess can specify a single initial value for all the coefficients (default .2) |
There are also the following non-array functions:
RidgeRSQ(Rx, Rc, std) | R-square value for Ridge regression model based on Rx and Rc; if std= TRUE, then the values in Rx have already been standardized; if std = FALSE (default) then the values have not. |
RidgeLambda(Rx, vif, iter) | the lowest lambda value for Ridge regression on the x values in Rx that generates a maximum VIF value less than vif; iter = the number of iterations in the search (default 25) |
RidgeMSE(Rx, Ry, lambda) | MSE of the Ridge regression defined by Rx, Ry, and lambda |
Finally, there are the following array functions that define partitions used by RidgeCVError.
RandPart(n,k) | column array with n rows with the values 1, 2, …, k randomly distributed where the number of times each integer appears is approximately equal. |
OrderedPart(n,k) | column array with n rows with the values 1, 2, …, k repeated as many times as necessary in that order. |
SortedPart(Rx, k) | column array with the same number of rows as R1 and containing the values 1, 2, …, k where the number of times each integer appears is approximately equal. The order of the values 1, 2, …, k is determined by the sort order in R1. |
Tobit Regression
TobitCoeff(Rx, Ry, iter, prec, Rg) | array function that returns an array whose initial rows contain the estimated Tobit regression coefficients and the standard errors. The last two rows contain the estimate for σ and its standard error, LL and the # of iterations. |
TobitLL(Rx, Ry, Rc) | LL for the Tobit regression coefficients in the column array Rc and (X, y) values in Rx and Ry. |
TobitPred(Rx0, Rc) | array function that returns an array containing the forecasted censored and uncensored values for the X vectors in Rx0 based on the Tobit regression coefficients in the column array Rc. |
Non-parametric Regression
LOESS(Rx, Ry, Rx0, npts, deg) | array function that returns a column array with the fitted values of the LOESS regression defined by Rx, Ry, and npts for the x values in Rx0. |
KTSReg(R1, lab, mopt, bopt, alpha) | array function that returns a column array with the values: slope, s.e. for the slope, the intercept, and confidence intervals of the slope and intercept for Kendall-Theil-Sens regression |
Rx is a column array or range containing x data values and Ry is a column array or range with the corresponding y values.
Rx0 is a column array with any number of x values, and npts = the number of points in the span. If npts is omitted (or set to zero) then npts is reset to (n+deg)/3 or the next larger integer where n = the # of elements in Rx (or Ry). If Rx0 is omitted it defaults to the array values in Rx. Rx0 can also take a scalar x value, in which case the output is the fitted y value corresponding to this x value. deg = 1 (default) for weighted linear regression and deg = 2 for weighted quadratic regression.
If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is output. alpha is the significance level (default .05). If mopt = TRUE (default), then SMALL is used to estimate the confidence interval for the slope; otherwise, SMALLExact is used. If bopt = TRUE (default), then the intercept is estimated using Approach 1; otherwise, Approach 2 is used.
Survival Analysis
LOGRANK(R1, R1, lab) | array function which returns the following statistics along with their p-value: log-rank 1, log-rank 2, Wilcoxon, Tarone-Ware |
COXEST(R1, approx, iter) | array function which returns Cox regression coefficients, their standard errors, convergence values, LL1 and LL0 values and covariance matrix |
COXPRED(R1, R2, R0, lab, approx, iter, alpha) | array function which predicts the hazard ratio between the two subject profiles in R1 and R2 (plus standard error and 1−alpha confidence interval) based on a Cox regression model derived from the input data in R0. |
If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted. The approx parameter takes the value: 0 if the continuum approximation is used and 1 (default) if the Breslow approximation is used. The output for COXEST and COXPRED is calculated using Newton’s Method with iter iterations (default = 20).
Analysis of Variance (ANOVA)
The following functions are used for one-factor ANOVA with replication where R1 = the input data in Excel format
Here b is an optional argument. When b = TRUE (default) then the columns denote the groups, while when b = FALSE, the rows denote the groups.
If R1 is in standard (i.e. stacked) format, then you can use the following formulas, where the colth column contains the data for one-way ANOVA (default col = 2):
SSWStd(R1, col) = SSW SSBetStd(R1, col) = SSBet SSTotStd(R1, col) = SSTot
The following functions are used for two-factor ANOVA where R1 = the input data in Excel format and r = the number of rows in R1 that make up an A factor level.
The second argument for the column and interaction terms is optional and can be dropped. Note that the column and total terms are identical to the between groups and total terms, respectively, for one-factor ANOVA.
ANOVARow(R1,r) = MSA/MSW | ATESTRow(R1,r) = p-value of A factor |
ANOVACol(R1,r) = MSB/MSW | ATESTCol(R1,r) = p-value of B factor |
ANOVAInt(R1,r) = MSAB/MSW | ATESTInt(R1,r) = p-value of AB factor |
If R1 is in standard (i.e. stacked) format, then you can use the following formulas for two-way ANOVA:
SSRowStd(R1) = SSA | SSColStd(R1) = SSB | SSIntStd(R1) = SSAB |
SSWFStd(R1) = SSW | SSTotStd(R1, 3) = SST | dfWFStd(R1) = dfW |
dfRowStd(R1) = dfA | dfColStd(R1) = dfB | dfIntStd(R1) = dfAB |
The following array functions are used to convert data between Excel’s formatting for Anova and standard format:
StdAnova1(R1) | converts data in R1 in standard format into Excel Single Factor Anova format |
Anova1Std(R1) | converts data in R1 in Excel Single Factor Anova format into standard (stacked) format |
StdAnova2(R1) | converts data in R1 in standard format into Excel Two Factor Anova format |
Anova2Std(R1, r) | converts data in R1 in Excel Single Factor Anova format with r rows per group into standard format |
Anova3Rows(R1) | converts data in R1 in standard Three Factor Anova columns format into Three Factor Anova rows format |
Anova3Cols(R1) | converts data in R1 in standard Three Factor Anova rows format into Three Factor Anova columns format |
StdNested(R1) | converts data in R1 in standard format into Excel Nested Anova format |
The following array functions are used to return a column array with ANOVA residuals:
Anova1ResX(R1) | returns residuals for one-way Anova data in R1 in Excel format |
Anova1Res(R1) | returns residuals for one-way Anova data in R1 in standard (stacked) format |
Anova2Res(R1) | returns residuals for two-way Anova data in R1 in standard (stacked) format |
Anova3Res(R1) | returns residuals for three-way Anova data in R1 in standard (stacked) format |
The following array functions are used to perform ANOVA via regression:
SSAnova2(R1, r) | returns a column array with SSRow, SSCol, SSInt, and SSW for a two-factor ANOVA for the data in R1 using a regression model; if r > 0 then R1 is assumed to be in Excel Anova format (with headings) with r rows per sample, while if r = 0 or is omitted then R1 is assumed to be in standard format (w/o headings) |
SSAnova3(R1) | returns a column array with SSA, SSB, SSC, SSAB, SSAC, SSBC, SSABC, and SSW for a three-factor ANOVA for the data in R1 using a regression model where the data in R1 is assumed to be in standard format by columns w/o column headings |
Trend Analysis and orthogonal polynomial contrasts
TREND_COEFF(R1, lambda) | returns the coefficient matrix for a treatment with k equally spaced groups for k ≥ 2. If lambda = TRUE (default FALSE), then a column of lambda values is appended to the output. |
TREND_ANOVA(R1, R2, sse, dfe, sst, lab) | returns the ANOVA analysis using polynomial coefficients based on the group means in R1 and group counts in R2. sse is the error SS, dfe is the error df, and optionally sst, which is the treatment SS. If lab = TRUE (default FALSE), then row/column headings are appended to the output. |
ART ANOVA Functions
Std2Art(R1, head) | converts data in R1 in standard two-factor ANOVA format to an array with five columns, the first two columns are identical to the first two columns of R1 and the other columns are the ART ranks for the Row, Col, and Int factors |
Std3Art(R1, head) | converts data in R1 in standard three-factor ANOVA format to an array with ten columns, the first three columns are identical to the first three columns of R1 and the other columns are the ART ranks for the A, B, C, AB, AC, BC and ABC factors |
If head = TRUE (default FALSE) then both R1 and the output contain column headings.
ANOVA-related Functions
LEVENE(R1, type) | p-value of Levene’s test for the data in R1 (organized by columns) where type = 0 for deviations from group means, type = 1 for deviations from group medians and type = -1 for deviations from 10% trimmed group means |
OBrienTest(R1) | p-value of O’Brien’s test for data in R1 (organized by columns) |
FKTEST(R1) | p-value of Fligner-Killeen test for data in R1 (organized by columns) |
BARTLETT(R1) | p-value of Bartlett’s test for data in R1 (organized by columns) |
CONTEST(R1) | p-value of Conover’s squared ranks test for data in R1 (organized by columns) |
DunnSidak(α, k) | 1−(1−α)1/k |
TUKEY(R1) | array with 3 columns and as many rows as there are pairwise comparisons. The first two columns contain the column numbers in R1 (from 1 to n) that are being compared and the third column contains the p-values for each of the pairwise comparisons. |
ADJK(nrows, ncols) | adjusted k value for Tukey HSD after two-factor ANOVA |
ADDITIVITY_ ANOVA(R1) | p-value of Turkey’s Additivity Test for the data in R1 in Excel format |
Randomized Complete Block Design (RCBD)
RCBDMissing(R1) | returns an array identical to R1 except that when one cell is non-numeric (representing a missing data value), then that cell is replaced by an imputed value |
RCBDAdjSS(R1, b) | adjusted SS value for rows (i.e. Blocks) if b = TRUE or for columns (i.e. Groups) if b = FALSE |
SS_RCBD(R1, std) | returns a column array with the values SSBlock, SSGroups, and SSError for the data in range R1 based on regression; if std = TRUE then R1 is assumed to be in stacked format and when std = FALSE (default) R1 is assumed to be in Excel format |
Sphericity
GGEpsilon(R1, ngroups, raw) | Greenhouse and Geisser epsilon value for the data in range R1 where ngroups = the number of groups; if raw = TRUE then R1 contains raw data, otherwise it contains a covariance matrix |
HFEpsilon(R1, ngroups, nsubj) | Huynh and Feldt epsilon value for the data in range R1 where ngroups = the number of groups; if nsubj = 0 then R1 contains raw data, otherwise it contains a covariance matrix which is derived from raw data with nsubj subjects (corresponding to rows). |
GG_Epsilon(R1) | Greenhouse and Geisser epsilon value for the data in range R1 which includes a column of labels |
HF_Epsilon(R1) | Huynh and Feldt epsilon value for the data in range R1 which includes a column of labels |
MauchlyTest(R1) | p-value of Mauchly’s test for sphericity on the data in range R1 |
JNSTest(R1) | p-value of the John-Nagao-Sugiura test for sphericity on the data in range R1 |
Non-parametric Tests
KRUSKAL(R1, ties) | Kruskal-Wallis test statistic for the data in R1 |
KTEST(R1, ties) | p-value for Kruskal-Wallis Test for the data in R1 |
FRIEDMAN(R1, ties, chi) | Friedman test statistic for the data in R1 |
FrTEST(R1, ties, chi) | p-value for Friedman’s Test for the data in R1 |
MOODS_STAT(R1) | chi-square statistic for Mood’s Median Test for the data in R1 |
MOODS_TEST(R1) | p-value for Mood’s Median Test for the data in R1 |
COCHRAN(R1, raw, cont) | Cochran’s Q statistic for the data in R1 |
QTEST(R1, raw, cont) | p-value for Cochran’s Q Test for the data in R1 |
FSTAR(R1) | Brown-Forsythe’s test statistic F* on the data in range R1 |
DFSTAR(R1) | df* for Brown-Forsythe’s test on the data in range R1 |
If ties = TRUE (default), then a ties correction is applied. If chi = TRUE (default), the chi-square version of the test is used; otherwise, an F test is used.
There are also the following array functions about non-parametric ANOVA-like tests:
WELCH_TEST(R1, lab) | returns a column array: F, df1, df2, and p-value for Welch’s test for the data in range R1 |
FSTAR_TEST(R1, lab) | returns a column array: F, df1, df2, and p-value for Brown-Forsythe’s test for the data in range R1 |
KW_TEST(R1, lab, ties) | returns a column array: H, H-ties, df, and p-value for the Kruskal-Wallis test for the data in range R1, if ties = TRUE (default) a ties correction is applied |
JT_TEST(R1, lab) | returns a column array: JT statistic, z-score, and p-value for the Jonckheere–Terpstre test for the data in range R1. |
PAGE_TEST(R1, lab, tails, lookup) | returns a column array: Page’s L-stat, z, p-value, and rho. |
If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted. If lookup = TRUE (default) use table lookup instead of z-statistic for p-value.
2k Factorial Design
DESIGN2k(k, lab, d) | returns the design matrix for a 2^k factorial design |
ExpandDesign2k(R1, d) | returns the design matrix for a 2^k factorial design augmented by R1 |
Effect2k(R1, R2, lab, d) | returns a column array consisting of the effect size values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values |
SS2k(R1, R2, lab, d) | returns a column array consisting of SS values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values |
If lab = TRUE then the output includes a column of labels (although for DESIGN2k this is a row of labels), while if lab = FALSE (the default) only the data is outputted. Output includes d-way interactions (d = 0, 2, 3; default = 2).
Taguchi DOE
TInteract(i, j, levels) | returns a row array with the interaction column(s) for factors in columns i and j |
TOptimize(R1, R2, optcode, sn, opt_only) | returns an array with the optimal combination of factors/levels if opt_only = TRUE (default) and all combinations otherwise |
ANOVA Power and Sample Size
ANOVA1_POWER(f, n, k, type, α, m, prec) | power of a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), f = RMSSE effect size if type = 2 and f = noncentrality parameter if type = 0 |
ANOVA1_SIZE(f, k, 1−β, type, α, m, prec) | minimum sample size required to obtain power of at least 1−β for a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), and f = RMSSE effect size if type = 2 |
NCP_ANOVA(R1, R2, v) | non-centrality parameter of a one-way ANOVA for group means specified in the column array R1 where the number of replications for each group is specified in column array R2 and the variance for any of the groups is v. |
TUKEY_POWER(R1, n, iter, alpha, clock) | an array with 3 columns whose first row corresponds to the ANOVA test and the other rows correspond to all possible pairwise group comparisons. Except for the first row, the first two columns contain the group numbers, from 1 to # of columns in R1, that are being compared and the third column contains the power for each of the pairwise comparisons. The first row contains the label “anova” and the power of the ANOVA test. |
For TUKEY_POWER, R1 is a column array with the effect sizes of each of the treatment groups in a one-way ANOVA. n = the number of elements in each group; iter = the # of simulations (default 100); alpha is the significance level (default .05). If clock = TRUE (default FALSE) then the first element in the output represents the time elapsed to create the output in seconds.
Intraclass Correlation
ICC(R1, class, type, lab, α) | array function which outputs the intraclass correlation coefficient ICC(class, type) plus the lower and upper bound of the 1−α confidence interval for the data in R1; default values are class = 2, type = 1, α = .05. If lab = TRUE, then an extra column of labels |
ICC_POWER(ρ0, ρ1, n, k, α) | power of ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items |
ICC_SIZE(ρ0, ρ1, k, 1−β, α) | minimum sample size required to obtain power of at least 1−β for ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items |
Distribution Functions
QDIST(q, k, df) | studentized q cumulative distribution value for q with k independent variables and df degrees of freedom |
QINV(p, k, df, tails) | inverse of the studentized q distribution, i.e. the critical value for the studentized q range; tails = 1 or 2 (default) |
Table Lookup
QCRIT(k, df, α, tails, h) | critical value in the Studentized Range Q table |
DCRIT(k, df, α, tails, h) | critical value in Dunnett’s test table |
DLowerCRIT(n, k, α, h) | lower critical value in the Durbin-Watson Table |
DUpperCRIT(n, k, α, h) | upper critical value in the Durbin-Watson Table |
If h = TRUE (default), then harmonic interpolation is used; otherwise, linear interpolation is used.
Categorical Coding
CATCODE(R1) | array function which fills the highlighted array with simple coding of values in range R1; if R1 is an m × n range, highlight an m × n range |
TAGCODE (R1, code, last) | array function that returns an m × (k–1) array with dummy coding of values in column array R1 where R1 has m rows and k unique values; if code = TRUE (default) 0,1 coding is used; otherwise, 0,1,-1 is used. If last = TRUE (default) then the last new element in R1 is the reference category; otherwise, the first element in R1 is the reference category. |
Iterative Proportional Fitting Procedure
IPFP2(R1) | array function which fills the highlighted array with the output from the IPF procedure on the two-contingency table with targeted marginal totals in R1; if R1 is an m × n range, highlight an m-1 × n-1 range |
IPFP3 (R1, R2) | array function which fills the highlighted array with the output from the IPF procedure on the three-contingency table in R1 with targets in R2. If R1 is an m × n range, highlight an m × n range. |