Release 2.12 of the Real Statistics Resource Pack

I am pleased to announce Release 2.12 of the Real Statistics Resource Pack. The new release is available for free download (Download Resource Pack) and is compatible with all Windows versions of Excel. The Excel 2010/2013 versions are available now. The other Windows versions will be available for download later today.

The Real Statistics Examples Workbook and the Multivariate Examples Workbook have been updated with some new examples. You can also download these files for free (Download Examples). The website is in the process of being updated for compatibility with the new release. This should be completed in a day or two.

This release provides the following new functionality:

Missing data: Improvements are being made to the functions and data analysis tools to better handle missing data. In this release the following changes have been made:

The MAD function which calculates the median absolute deviation ignores all empty and non-numeric cells.

The SHAPIRO and SWTEST functions used to perform the Shapiro-Wilk test ignore all empty and non-numeric cells.

The SignTest, SRank, SRTest, SRankPair and SRTestPairs functions, which perform the sign test and signed rank tests, ignore all empty and non-numeric cells.

The LEVENE function, which performs the Levene test for homogeneity of variances, ignores all empty and non-numeric cells.

A second argument b has been added to the CORR(R1,b) function which creates the correlation matrix for an input data range R1. If b = True (default) then missing data in R1 is dropped listwise, i.e. any row in R1 with one or more empty or non-numeric cells is not used in calculating the correlation matrix. If b = False then missing data in R1 is dropped pairwise, i.e in computing the correlation between two columns in R1 any data pair in which one or both cells in the pair are empty or non-numeric are not used for the calculation of that correlation value.

Covariance matrices: COVP(R1, b) and COV(R1, b): As for CORR(R1, b), if b = True missing data is dropped listwise, while if b = False missing data is dropped pairwise.

The Descriptive Statistics and Normality and T Test and Non-parametric Equivalents data analysis tools ignore all empty and non-numeric cells.

New count functions: also used for handling missing data

CountFullRows(R1,b) = the number of rows in range R1 which don’t have any empty cells if b = True (default) and the number of rows in range R1 which don’t have any non-numeric cells if b = False.

We have also added a related function CountPairs(R1, R2, b). Here we look at pairs of cells from R1 and R2: the ith cell in R1 is paired with the ith cell in R2.

CountPairs(R1, R2, b) = the number of pairs for which neither cell in the pair is empty if b = True (default) and the number of pairs for which neither cell in the pair is empty or non-numeric if b = False

New correlation test functions:

CorrelTest(r, size, rho, alpha, lab): array function which tests whether the population correlation coefficient is significantly different from a sample correlation coefficient. In particular, this function outputs the test statistic z, the p-value and the lower and upper bound of the 1–alpha confidence interval, where rho is the hypothesized population correlation coefficient, r is the sample correlation coefficient and size is the sample size. If lab = True then the output takes the form a 2 × 4 range with the first column consisting of labels, while if lab = False (default) then the output takes the form a 1 × 4 range without labels.

CorrelTest(R1, R2, rho, alpha, lab) = CorrelTest(r, size, rho, alpha, lab) where r = CORREL(R1, R2) and size = the common sample size (if there are non-numeric cells in R1 and/or R2, this is the number of numeric pairs from R1 and R2).

In addition to the CorrelTest function which performs one sample correlation coefficient hypothesis testing, we have also added the new function Correl2Test which determines whether two pairs of samples come from populations with the same correlation coefficients (i.e. two sample correlation coefficient hypothesis testing).

Changes to the frequency table function:

FREQTABLE(R1, bsize. bmax): an array function which produces the frequency table for the data in range R1, assuming equally sized bins of size bsize where bmax is the maximum bin size value. The bmax argument has now been added to provide similar functionality to that found in the Frequency Table data analysis tool.

Change to the Chi-square data analysis tool: In addition to supporting a summarized table of input data, the tool will now also accept data in “standard format”, similar to the format used by other statistical tools such as SPSS.

Coding of Categorical Variables: As described in the website, especially regarding regression, it is common to create dummy (or tag) coding for categorical variables. We have employed both the usual coding (using 1 and 0) as well as the alternative coding (using 1, 0, -1). We have now added a new array function which carries out these codings. In addition we now support simple coding, where categories such as Democrat, Republican, Independent are coded as 0, 1, 2.

CATCODE(R1): If R1 is an m × n array then this function returns an m × n array with the simple coding of the corresponding cell in R1.

TAGCODE (R1, b): if R1 is an m × 1 column range with k unique values, then this function returns an m × (k–1) range with the dummy coding of the values in R1 using the 1/0 coding when b = True (the default) and the 1/0/-1 coding when b = False.

Eigenvalues: In cases where the eVALUES and eVECTORS functions produce non-real eigenvalues the following changes have been made

eVALUES(R1, iter, order): As before this function produces a row vector containing the eigenvalues of the matrix A in range R1. In addition if you highlight a second row, below each eigenvalue λ is the value det(A─λI). If some of these values are not close to zero this is an indication that the eigenvalue is bogus (because it is not a real number).

eVECTORS(R1, iter, order, prec): As before this function produces a row with the eigenvalues as for eVALUES(R1). Below each eigenvalue is a unit eigenvector corresponding to this eigenvalue. Now when a value of prec > 0 is specified (where prec is a small number), a test is made to see whether each eigenvalue is a real number. Specifically when abs(det(A─λI)) > prec it follows that det(A─λI) ≠ 0 and so λ is not a real eigenvalue. In these cases the value for the eigenvalue is displayed as #N/A. If prec is set to 0 (or is omitted) then this test is not made  (e.g. for symmetric matrices it is not necessary since all the calculated eigenvalues will be real).