Real Statistics Rel 4.12 | Real Statistics Using Excel

I am pleased to announce Release 4.12 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 and the Multivariate Examples files have been updated for compatibility with the new release.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.12.

The new release contains the following new capabilities:

Deming Regression

A new Deming Regression data analysis tool has been added, along with the following functions:

DRegCoeff(R1, R2, λ, lab) = 2 × 2 array consisting of the intercept and slope coefficients and standard errors for Deming regression on the data in R1 and R2 where lambda = λ.

DRegResiduals(R1, R2, λ, lab) = n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 aand R2 based on the Deming regression on the data in R1 and R2 where lambda = λ and n = the number of elements in R1 (or R2).

DRegIdentity(R1, R2, λ, lab) = 2 × 1 array consisting of and for Deming regression on the data in R1 and R2 where lambda = λ.

DRegPred(x0, R1, R2, λ, lab) = 4 × 1 array consisting of the predicted value of y corresponding to x0, the standard error for this prediction and the lower and upper ends of a confidence interval for this prediction based on the Deming regression on the data in R1 and R2 where lambda = λ.

If lab = TRUE (default FALSE), then a column with labels is appended to the output for readability.

If an explicit lambda argument is omitted, then it is calculated from the data in R1 and R2. If an explicit lambda argument is given, then it is assumed that R1 and R2 are column arrays. We also have the following function, which explicitly calculates lambda from the data in R1 and R2:

DRegLambda(R1, R2) = the lambda value calculated from R1 and R2

The standard errors are calculated using jackknifing as described on the website.

Split-Half and Guttman Reliability

A new Split-Half/Guttman data analysis tool has been added which calculates split-half reliability statistics using Spearman-Brown as well as Guttman’s measurement. In addition, the following functions are now supported.

COV_SPLIT(R1, s) = sample covariance for the data in range R1 based on the split described by string s.

CORR_SPLIT(R1, s) = correlation for the data in range R1 based on the split described by string s.

GUTTMAN_SPLIT(R1, s) = Guttman’s lambda for the data in range R1 based on the split described by string s.

GUTTMAN(R1) = the Guttman’s reliability measure for the data in range R1, i.e. the maximum Guttman’s lambda based on all possible splits; when the number of splits is too large, a second argument iter can be used to find an approximate maximum Guttman’s lambda based on a randomly generated iter number of splits.

String s consists of 0’s and 1’s where each character in the string corresponds to one column in R1 (thus the length of s must be equal to the number of columns in R1). E.g. the string s = “101010” represents the split half where the odd numbered questions are in one half and the even numbered questions are in the other half.

The following functions relate to the Spearman-Brown correction:

SB_SPLIT(R1, s) = split half coefficient (after Spearman-Brown correction) for data in R1 based on the split described by the string s.

SB_CORRECTION(r, n, m) = Spearman-Brown correction when the split-half correlation based on an m vs. n–m split is r. If n is omitted, then it is assumed that there is a 50-50 split. If n is present but m is omitted, then it is assumed that m = n/2.

Partitioning Functions

To support the Guttman functions, described above, the following functions have been added:

INIT_SPLIT(n, m) = returns a string of length n consisting of m 0’s followed by n–m 1’s. If omitted m defaults to n/2.

NEXT_SPLIT(s) = returns the string representing the next split after the split represented by s.

E.g. use of these functions will generate the following list of strings of length four: 0011, 0101, 0110, 1001, 1010, 1100.

The following new functions have also been added:

INIT_PARTITION(n) = returns a string consisting of n 0’s

NEXT_PARTITION(s) = returns the string representing the next partition after the partition represented by s.

E.g. use of these functions will generate the following list of strings of length three: 000, 001, 010, 011, 100, 101, 110, 111.

Finally, we also have also added the following functions:

RAND_SPLIT(n, m) = returns a random string of length n consisting of m 0’s and n–m 1’s. If omitted m defaults to n/2.

RAND_PARTITION(n) = returns a random string of length n consisting of 0’s and 1’s.

Spearman-Brown Predicted Reliability

The following functions have been added about Spearman-Brown predicted reliability. These functions can also be used with Cronbach’s Alpha.

SB_PRED(m, rho, n) = Spearman-Brown predicted reliability based on m items when Spearman-Brown for n items is rho.

SB_SIZE(rho1, rho, n) = the number of items necessary to bring the Spearman-Brown predicted reliability up (or down) to rho1 from n items with Spearman-Brown of rho.

LAD Regression Enhancements

The following enhancements have been made to the LAD Regression data analysis tool and the functions that support LAD regression:

Support for regression without an intercept has been added. Note the revised forms of the following functions where con = TRUE (default) if regression is with an intercept and con = FALSE if no constant regression term is used.

LADRegCoeff(R1, R2, con, iter)

LADRegWeights(R1, R2, con, iter)

An option has been added to the LAD Regression data analysis tool to report the standard error for the regression coefficients, as well as confidence intervals, based on bootstrapping. In addition, there is the following new function which outputs the standard errors based on nboot bootstraps:

LADRegCoeffSE(R1, R2, con, iter, nboot)

Note too that a bug has been fixed where the variable names were not shown correctly when the output from the data analysis tool was displayed on a new page.

Total Least Squares Regression

The following functions have been added to calculate the regression coefficients when total least squares is used instead of ordinary least squares.

If there is only one independent variable, the results will be the same as Deming Regression where lambda = 1. With more than one independent variable, the singular value decomposition (SVD) of the matrix containing the data from both the independent and dependent variables is used.

TRegCoeff0(R1, R2, lab) = column array consisting of the intercept and slope coefficients based on total least squares linear regression using the data in R1 and R2, where both R1 (containing x data) and R2 (containing y data) are column arrays. If lab = TRUE (default FALSE) then a column is appended with the labels “intercept” and “slope”.

TRegCoeff(R1, R2, iter) = column array consisting of the regression coefficients based on total least squares linear regression using the data in R1 and R2. iter (default 100) is the number of iteration used in calculating the SVD. iter is ignored if R1 contains only one column.

Three Factor ANOVA without Replication

The Three Factor ANOVA data analysis tool has been enhanced to support three factor ANOVA without replication.

Enhancement to Average Rank Function

The following function has been enhanced to solve problems with the standard Excel RANK.AVG function:

RANK_AVG(x, R1, order, num_digits)

This function is equivalent to the standard Excel function RANK.AVG, except that the values in range R1 are rounded off to num_digits decimal places (default 8) before the ranking is done.

This function addresses the following shortcomings in the RANK.AVG function:

RANK.AVG doesn’t handle decimal precision very well. E.g. you can have cases where 12.1 appears in the range A1:A10, yet =RANK.AVG(12.1, A1:A10) returns the error value #N/A.
RANK.AVG only accepts an explicit cell range as its second argument. E.g. even though A1:A10-B1:B10 evaluates to a cell range, =RANK.AVG(A1, A1:A10-B1:B10) will produce an error
RANK.AVG is not available in versions of Excel prior to Excel 2010.

Singular Value Decomposition Enhancement

The Singular Value Decomposition (SVD) option on the Matrix Operations data analysis tool has been enhanced to support non-square matrices. This is also the case for the following functions: SVD_U, SVD_D and SVD_V.

Donation Webpage

The website and Real Statistics software are free and will remain free of charge.

In order to offset some of my costs (especially for hosting services, software and materials), I am asking for a small donation if you are happy with the services that you have been receiving. This will be appreciated, but is completely optional. To donate, please click on Please Donate.

Bug Fixes

A bug in calculating the confidence interval in the Multiple Anova option of the Manova data analysis tool has been fixed.
When clicking on the Mann-Whitney, Wilcoxon Signed Ranks or Kruskal-Wallis options of the Non-parametric Tests data analysis tool, the Descriptive Statistics data analysis tool was incorrectly displayed. This has now been corrected.
Fixed an error in the display of Cook’s D output from the Linear Regression data analysis tool when the Column headings included in the data is unchecked.
Minor changes to some of the dialog boxes have been made, especially to correct some errors in the tool tips.
Various broken links on the website have been repaired, which should make the user experience better.

Improved Web Hosting Performance

Last month I contracted to upgrade the web hosting capabilities to improve the availability and response time of the Real Statistics website. After a few weeks of problems, I believe we are now seeing the benefits of this upgrade.

Ongoing Activities

I continue to upgrade the Real Statistics Resource Pack to accomplish the following two objectives. This will take time and will happen over the next several software releases:

Add documentation for each Real Statistics function to Excel’s standard Insert Function facility
Allow any function arguments that call for a cell range to use the output from another function as long as that output is equivalent to a cell range. E.g. you can also use the formula =GUTTMAN(TRANSPOSE(A1:J5)) or =GUTTMAN(A1:E10-F1:J10).

Charles