Real Statistics Release 5.3 | Real Statistics Using Excel

I am pleased to announce Release 5.3 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2013 and 2016 (Windows version) environments. The resource pack for Excel 2007 and 2010 environments will be available later today or tomorrow. The resource pack for Excel 2011 and 2016 Mac environments will be available in a few days.

The various examples workbooks have also been updated for capability with the new release. The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.3.

Ridge Regression

A new Ridge Regression data analysis tool has been added that performs Ridge regression, which is especially useful to handle multicollinearity.

Supporting this new tool are the following new functions:

RidgeRegCoeff: calculates the Ridge regression coefficients and standard errors.

RidgeRSQ: calculates the R-square value for Ridge regression.

RidgeVIF: calculates the VIF values for the independent variables.

RidgeCVError: calculates the Ridge regression k-fold cross-validation error for a particular value of lambda; used to estimate a desirable lambda value.

LASSO Regression

A new LASSORegCoeff function has been added to estimate the LASSO (least absolute selection and shrinkage operator) regression coefficients using a cyclical coordinate descent algorithm.

Standardized Regression coefficients

The following functions have been added:

STDCOL: takes an array or cell range and outputs an array that has the same dimensions but with a standardization of the values in each column.

StdRegCoeff: outputs the regression coefficients that corresponds to the standardization of the x and y input data.

UnstdRegCoeff: does the reverse of StdRegCoeff by outputting the unstandardized regression coefficients when the standardized regression coefficients are known.

These functions are used in performing Ridge regression.

Multiple Regression Solver option

The algorithm that performs multiple linear regression calculates (X^TX)^-1 where X is the design matrix. Ridge and LASSO regression are used when X^TX is not invertible or when it is close to not being invertible (such as when there is multicollinearity or when there are more independent variables than data elements.

Sometimes (X^TX)^-1 can’t be calculated accurately in Excel because of an overflow error. This can occur when there are a large number of independent variables. In such cases, the results from the Multiple Linear Regression data analysis tool will be strange (e.g. R-square value larger than one or a negative value for SSE).

A new Use Solver option has now been added to the Multiple Linear Regression data analysis tool to handle such situations.

Cochran-Mantel-Haenszel Test

A Cochran-Mantel-Haenszel Test data analysis tool has been added. This test determines whether the odds ratios of a series of 2 × 2 contingency tables are significantly different from one. The data analysis tool also includes Woolf’s Heterogeneity test which determines whether the odds ratios are significantly different.

The analysis tool uses the following new array functions: CMHTest and WoolfTest.

Sphericity Tests

The following two tests for sphericity have been added:

MauchlyTest(R1) = p-value of Mauchly’s test for sphericity on the data in range R1

JNSTest(R1) = p-value of the John-Nagao-Sugiura test for sphericity on the data in range R1

Partitions

The following functions partitions the numbers 1 through n into k approximately equal-sized groups.

RandPart(n, k): random partition

OrderedPart(n, k): ordered partition

SortedPart(n, k, R1): ordered partition based on the sort order in the column range R1 with n

E.g. OrderedPart(10,3) outputs a column array with 10 rows containing the values 1, 2, 3, 1, 2, 3, 1, 2, 3, 1 (in that order). RandPart(10,3) outputs a column array with values such as 2, 1, 1, 3, 2, 1, 2, 3, 1, 3 (the values 2 and 3 are repeated 3 times and 1 is repeated 4 times). If R1 is a column range with the values 1.1, -1.4, 2.5, 3.6, 0.5, then SortedPart(R1,3) outputs a column range with the values 3, 1, 1, 2, 2 (in that order).

Chi-square Independence Test enhancement

The Chi-square Independence Test data analysis tool supports two input formats: Excel format (in the form of a contingency table) and Standard format. The Standard format is a two column range specifying pairs of headings for the contingency table. Thus if this range contains say 10 rows then the sum of all the cells in the contingency table would be 10.

A new version of the standard format is now also supported. It consists of three columns, the first two columns are as in the previous version, while the third column contains non-negative integer values, specifying how many times the pairs in the first two columns are to be repeated. The total cell count in the contingency table now equals the sum of the values in the third column. Also, the two column version of the standard format is equivalent to the three column version where the third column contains all ones.

Descriptive Statistics and Normality enhancement

When the Shapiro-Wilk option is chosen from the Descriptive Statistics and Normality data analysis tool, in addition to the Shapiro-Wilk test, the results of the d’Agostino-Pearson test for normality are also displayed.

Sort rows enhancements

Changed the QSORTRows and QSORT2Rows functions so that they properly sort rows with an empty cell in the column with the sort key(s).
These functions as well as QSORT2RowsMixed now retain the original order in case of ties.
All three functions now take an optional last argument that ensures that a header row is not sorted but remains in the first row.

Other enhancements

BOXCOX(R1) and BOXCOXLambda(R1) now work properly even when range R1 contains non-positive values (solves an issue for Luciano).
Additional error checking has been added to some of the data analysis tools.
The Correlation and Multivariate function categories have been added and many more functions are now supported by the Paste Function (f_x) button on the Formula bar (in the Excel 2010, 2013 and 2016 (Windows) versions of the software.
If the Input Range Y field in the dialog box for the Multiple Linear Regression data analysis tool is not filled in, then the last column in the Input Range X field is used as the y values.
The Regression option of the Two Factor ANOVA data analysis tool now supports models without replications.

Bug Fixes

Fixed TiesCorrection in the one-sample case (thanks to Uwe for identifying this error)
Fixed KSCRIT and KSPROB (thanks to Daniel for identifying this error)
Fixed F_DIST in the pdf case (thanks to Antonio for identifying this error)
Fix error in the Basic Forecasting data analysis tool when output is displayed on a new worksheet
Fixed a bug in the CorrTest function
The Reformat for Linear Regression option of the ARIMA Model and Forecast data analysis tool now uses the correct input data