I am pleased to announce Release 7.3 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2010, 2013, 2016, 2019 and 365 in both Windows and Mac environments.
The Basics, ANOVA 1, ANOVA 2, Correlation/Reliability and Distributions examples workbooks have also been updated for compatibility with the new release. Over the course of the next several days, the website will also be updated for compatibility with the new release,
If you are getting value from the Real Statistics website or software, I would appreciate your donation to help offset the costs of the website by going to Please Donate.
The following is an overview of the new features in Release 7.3:
Simulation version of Fisher Exact Test
Quite a few people have asked about how to conduct the Fisher Exact Test for larger contingency tables where the ordinary Fisher Exact Test would take to long to run. A simulated version of the exact test has now been added which can be used with such contingency tables. It is especially useful when too many of the cells contain values less than 5.
Two versions of this test are provided, one based on the usual chi-square test and the other based on the maximum likelihood version of the chi-square test. These tests are provided as an option on the Chi-square Test for Independence data analysis tool as well as via the following array function:
CHISQ_SIM (R1, lab, iter, chi, alpha): returns a column array with the values: p-value, standard error and lower/upper ends of a 1–alpha confidence interval (alpha defaults to .01) for a simulated quasi-exact chi-square test of independence with iter simulations (default 10,000) based on the contingency table (with headings) in R1 where if chi = TRUE (default) the Pearson’s chi-square test is used, while if chi = FALSE the maximum likelihood version of the test is used; if lab = TRUE (default FALSE) a column of labels is appended to the output.
The following two new functions have also been added that conduct the chi-square test for independence when the raw data is formatted in two one-column arrays (i.e. standard format) instead of as a contingency table:
CHISQ_STAT(R1, R2, chi) = chi-square statistic for the data in column arrays R1 and R2; if chi = TRUE (default), then Pearson’s chi-square statistic is returned; otherwise the maximum likelihood statistic is returned
CHISQ_TEST(R1, R2, chi) = p-value of the chi-square test for independence where R1, R2 and chi are as for the CHISQ_STAT function.
UCON Rasch analysis for polytomous scores
In addition to the PROX method and dichotomous version of the UCON method for Rasch analysis, available previously, we have added the polytomous version of the UCON method. This can be used for Likert scoring and for tests with partial credit.
The polytomous version of the UCON method is available as an option on the Rasch data analysis tool as well as from the following array function:
UCON(R1, head, iter, prec): returns an array with the expected values of the scores for subjects/items, as well as as the ability and difficulty estimates along with their standard errors based on the scores in R1 and iter iterations (default 100); if head = TRUE (default) then R1 contains subject and item headings; if the change from one iteration to the next is less than prec (default .001), then convergence is deemed to have been reached even before iter iterations.
In addition, the following new array functions are provided where R1, head, iter and prec are as for the UCON function.
UCONFIT(R1, head, iter, prec): returns an array with fit values, including infit and outfit statistics for ability and difficulty
UCON_SUBJ(R1, head, iter, prec): returns a three-column array with the ability estimates and their standard errors for the subjects in R1
UCON_ITEM(R1, head, iter, prec): returns a three-column array with the difficulty estimates and their standard errors for the items in R1
UCON_THRESH(R1, head, iter, prec): returns a two-column array with the category threshold estimates for the items in R1
Aligned Rank Transform ANOVA
Aligned Rank Transform (ART) ANOVA is a non-parametric approach to factorial ANOVA that enables you to analyze the interaction as well as the main effects. As usual, ranked data is used, but first, the data for each effect (main or interaction) must be aligned before ranks are calculated. This approach is useful when data are not normally distributed.
Two-Factor ART Anova is available as an option on the Two Factor ANOVA data analysis tool. In addition, the following two array functions are available to convert data in standard (stacked) format into the format used for ART Anova.
Std2Art(R1, head): takes the data in R1 in standard two-factor ANOVA format (i.e. a three-column array whose first two columns consist of row and column factor labels, and whose third column consists of the corresponding values of the dependent variable) and returns an array with five columns, the first two columns are identical to the first two columns of R1 and the other columns are the ART ranks for rows, columns and interactions; if head = TRUE (default FALSE) then both R1 and the output contain column headings
Std3Art(R1, head): takes the data in R1 in standard three-factor ANOVA format (i.e. a four-column array whose first three columns consist of A, B and C factor labels and whose last column consists of the corresponding values of the dependent variable) and returns an array with ten columns, the first three columns are identical to the first three columns of R1 and the other columns are the ART ranks for the A, B, C, AB, AC, BC and ABC factors; if head = TRUE (default FALSE) then both R1 and the output contain column headings
Two-factor ANOVA functions for data in standard format
The following functions have been added for balanced data in two-factor ANOVA standard (stacked) format; i.e. a three-column array whose first two columns contain the labels for the row and column factors and whose third column contains the corresponding response values.
SSRowStd(R1) = SSRow SSColStd(R1) = SSCol
SSIntStd(R1) = SSInt SSWFStd(R1) = SSW
SSTotStd(R1,3) = SST
dfRowStd(R1) = dfRow dfColStd(R1) = dfCol
dfIntStd(R1) = dfInt dfWFStd(R1) = dfW
Enhancement to the BIWEIGHT and HUBER Functions
A new cutoff argument c has been added to the existing BIWEIGHT function so that the function takes the form
BIWEIGHT(R1, iter, prec, c, pure) = returns Tukey’s biweight estimator of the data in R1 based on a maximum of iter iterations (default 50) and the selected value for c (default 4.685); if the change in the biweight estimate from one iteration to the next is less than prec (default 0.00000001), then convergence is deemed to have been reached even before iter iterations; if the biweight estimate is undefined for the given value of c then when pure = FALSE (default) the value of c is increased until a valid biweight value is found, and when pure = TRUE then an error value is returned.
The Huber function has also been enhanced to take a similar form, namely HUBER(R1, iter, prec, c, pure) where c defaults to 1.339. Unlike the biweight, the Huber function is always defined except when MAD(R1) = 0. In this case, when pure = TRUE, the function yields an error value, while if pure = FALSE then the function takes the value of MODE(R1). This is based on the fact that MAD(R1) = 0 only occurs when more than half the elements in R1 have the same value.
Mean of Successive Squared Differences (MSSD)
The new MSSD(R1) function computes the mean of successive squared differences (MSSD) of the data sequence contained in the column array R1. As an array function, MSSD can be used to test the randomness of the data sequence in R1.
MSSD(R1, lab): returns a column array with the values: MSSD of R1, the variance of R1, z-statistic and p-value for the MSSD test of the randomness of the sequence in the column array R1; if lab = TRUE (default FALSE) a column of labels is appended to the output
The following functions can be used for small samples to obtain the critical values and p-values associated with the MSSD test of randomness:
MSSD_CRIT(n, alpha, interp) = critical value of the z-statistic of the MSSD test for sample size n (an integer value between 8 and 150) and significance level alpha between .0005 and .25 (default .05); if interp = TRUE (default) then the recommended interpolation approach is used, while if interp = FALSE then linear interpolation is used.
MSSD_PROB(z, n, iter, interp, txt) = the MSSD p-value based on the z-statistic z for sample size n; interp is as for MSSD_CRIT; iter = the number of iterations used to calculate the p-value from the table of critical z-statistic values (default 40); if txt = FALSE (default) and p-value < .0005 then the value zero is returned and if p-value > .25 then the value one is returned, while if txt = TRUE then the values “< .0005” and “> .25” are returned.
Kronecker Product
The following array function is now supported:
KMULT(R1, R2): returns an array with the Kronecker product of the R1 and R2 arrays
Spearman’s rho correlation enhancement
The SCORREL array function has now been enhanced to also output a 1–α confidence interval.
ANOVA Non-centrality Parameter Estimation
A new function is now available that helps estimate the non-centrality parameter for ANOVA and effect size measures (Cohen’s f, partial eta-square, etc.). This is helpful in calculating statistical power and sample size.
NCP_ANOVA(R1, R2, v) = the non-centrality parameter of a one-way ANOVA for group means specified in the column array R1 where the number of replications for each group is specified in column array R2 and the variance for any of the groups is v.
Partial Eta-Squared Effect Size for ANOVA
The partial eta-squared effect size has been added to various ANOVA data analysis tools. (Note that for one-way ANOVA, partial eta-squared is the same as eta-squared.)
Eta-squared has also been added as an effect size option for the ANOVA_POWER and ANOVA_SIZE functions (by setting ttype = 3) as well as to the One-way ANOVA option of the Statistical Power and Sample Size data analysis tool.
Bug Fixes
- Fixed some bugs in the Two-Sample Anderson-Darling option of the Goodness Of Fit data analysis tool
- Fixed a bug in the FREQ_REFORMAT function that causes the histogram in the Histogram with Normal Curve Overlay data analysis tool to be shifted one bin length to the right.
- Corrected some errors in calculating the QR Factorization of a non-square matrix or one that doesn’t have full rank. This impacts the QRFactorQ, QRFactorR, QRFactor, QRFullQ, QRFullR and QRFull functions as well as the QR Factorization option of the Matrix Operations data analysis tool.
Doc, Thank you very much