I am pleased to announce Release 5.7 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2013 and 2016 (Windows) environments. Release 5.7 will be available for Excel 2011 and Excel 2016 for Mac in about a week.
The Real Statistics 1B and 2A examples workbooks have also been updated for compatibility with the new release. The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities.
If you are getting value from the Real Statistics website or software, I would appreciate your donations to help offset the costs of the website by going to Please Donate.
The following is a summary of the new features in Release 5.7. The first few new features provide additional support for missing data and unbalanced models.
Mann-Whitney Exact Test
Support for the Mann-Whitney Exact Test has been completely revised. This test now supports much larger sample sizes. The recommended limits are now 300 for the size of the smaller sample and 1,000 for the larger sample (instead of only about 30 between the two samples in the previous versions of Real Statistics).
This means that when the Two samples and Non-parametric options of the T Tests and Nonparametric Equivalents data analysis tool have been chosen, the output has now been revised to show the results of the exact test most of the time.
It also means that the following Real Statistics functions have been revised to better support the Mann-Whitney Exact test.
PERM2DIST(x, n1, n2, cum) = value of the Mann-Whitney version of the two-sample permutation distribution at x based on n1 and n2 elements; returns the pdf value if cum = FALSE and the cdf value if cum = TRUE (default).
PERM2INV(p, n1, n2) = inverse of the Mann-Whitney version of the two-sample permutation distribution at p; i.e. the least value of x such that PERM2DIST(x, n1, n2, TRUE) ≥ p.
Note that the slow argument has been eliminated since it is no longer necessary. You will need to modify any existing spreadsheets that use the slow argument by dropping the value of this argument from any formula that uses PERM2DIST(x, n1, n2, cum, slow) or PERM2INV(p, n1, n2, slow).
Note that PERM2DIST(x, n1, n2, cum, FALSE) and PERM2INV(p, n1, n2, FALSE) return the Wilcoxon Ranked Sum versions of the two-sample permutation distribution and its inverse. The previous version of the PERM2DIST and PERM2INV functions was based on the Wilcoxon Ranked Sum version of the test, while the new version is based on the Mann-Whitney test unless the FALSE argument is added.
In addition, the MANNDIST, MANNINV and MANN_EXACT functions have been replaced by the following functions:
MWDIST(x, n1, n2, tails) = p-value of the Mann-Whitney exact distribution at x based on n1 and n2 elements, where tails = 1 or 2 (default)
MWINV(p, n1, n2, tails) = inverse of the Mann-Whitney exact distribution at p; i.e. the least value of x such that MWDIST(x, n1, n2, cum, tails) ≥ p, where tails = 1 or 2 (default)
MW_EXACT(R1, R2, tails) = p-value of the Mann-Whitney exact test on the data in ranges R1 and R2, where tails = 1 or 2 (default)
Note that MWDIST(x, n1, n2, tails, FALSE) and MWINV(p, n1, n2, FALSE) return the Wilcoxon Ranked Sum versions of the Mann-Whitney distribution and its inverse.
The following new array function has also been added:
PERM2_DIST(n1, n2, cum) returns a column array with the p-values of the Mann-Whitney exact test for values of U from 0 to n1*n2 when cum = TRUE (default) and the frequency values when cum = FALSE.
In addition, the Mann-Whitney Table of critical values has been expanded on the website, although this table is really no longer needed since the MWINV function returns a wide range of critical values, including all the values from the table. Also SRDIST and SRINV should now be used instead of the SRankPROB and SRankCRIT functions.
Note that the SRankPROB and SRankCRIT functions are still available for backwards compatibility, but they will no longer be supported or included in the reference materials on the website. The values output by these functions will now be calculated using SRDIST and SRINV.
Mann-Whitney Simulation
The following new function creates a simulated version of the exact test. This function can be useful for very large samples where the exact test can’t be used or where there are many ties.
MW_SIMUL(R1, R2, lab, iter): returns a column array with the U value for R1 and R2 (i.e. U = MANN(R1, R2)), the p-value for the left tailed test, the p-value for the right tailed test and the p-value for the two-tailed test; if lab = TRUE (default FALSE) then an extra column of labels is appended to the output; iter = # of iterations in the simulation (default 10,000).
Other Mann-Whitney Test Improvements
The following functions replace the MANNTEST and MANN_TEST functions:
MWTEST(R1, R2, tails, ties, cont) = p-value of the Mann-Whitney U test for the samples contained in ranges R1 and R2 using the normal approximation. tails = # of tails = 1 or 2 (default). If ties = TRUE (default) the ties correction is applied. If cont = TRUE (default) a continuity correction is applied.
MW_TEST(R1, R2, lab, tails, tails, ties, cont, exact, iter): returns a column array with the values U-stat, z-stat, r effect size and three types of p-values for the samples contained in ranges R1 and R2 using the normal approximation, exact test and simulation. tails = 1 or 2 (default). For the normal approximation, if ties = TRUE (default) the ties correction factor is applied, while if cont = TRUE (default) a continuity correction is applied. If exact = TRUE (default FALSE) then the p-value of the exact test is output and if iter ≠ 0 then the p-value of the simulation version of the test is output where the simulation consists of iter samples (default 10,000). If lab = TRUE (default FALSE) then an extra column of labels is appended to the output.
Note that the MANNTEST and MANN_TEST functions are still available for backwards compatibility, but they will no longer be supported or included in the reference materials on the website.
The T Tests and Nonparametric Equivalents data analysis tool has also been revised so that when the Two independent samples and Non-parametric options are chosen (representing the Mann-Whitney test), the output is similar to that obtained from the MW_TEST function. In particular, the p-values of the normal approximation, exact test and simulation test are output.
Wilcoxon Signed-Ranks Exact Test
Support for the Wilcoxon Signed-Ranks Exact Test. has been completely revised. This test now supports much larger sample sizes. The recommended limits are now 1,000 for the size of the sample (instead of only about 30 in the previous versions of Real Statistics).
This means that when the One sample or Paired samples and Non-parametric options of the T Tests and Nonparametric Equivalents data analysis tool are chosen, the output has now been revised to show the results of the exact test most of the time.
It also means that the following Real Statistics functions have been revised to better support the Wilcoxon Signed-Ranks Exact test.
PERMDIST(x, n, cum) = value of the one-sample permutation distribution at x based on a sample with n elements; returns the pdf value at x if cum = FALSE and the cdf value if cum = TRUE (default)
PERMINV(p, n) = inverse of the permutation distribution at x; i.e. the least value of x such that PERMDIST(x, n, TRUE) ≥ p
Note that the slow argument has been eliminated since it is no longer necessary. You will need to modify any existing spreadsheets that use the slow argument by dropping the value of this argument from any formula that uses PERMDIST(x, n, cum, slow) or PERMINV(p, n, slow).
The following functions have also been added:
SRDIST(x, n, tails) = p-value of the signed-ranks exact distribution at x based a sample with n elements where tails = 1 or 2 (default)
SRINV(p, n, tails) = inverse of the signed-ranks exact distribution at p; i.e. the least value of x such that SRDIST(x, n, tails) ≥ p, where tails = 1 or 2 (default)
In addition, the SRANKPair_EXACT and SRANK_EXACT functions have been replaced by the following function.
SR_EXACT(R1, R2, tails) = p-value of the paired signed-ranks exact test on the data in ranges R1 and R2 where tails = 1 or 2 (default)
SR_EXACT(R1, med, tails) = p-value of the one-sample signed-ranks exact test on the data in range R1 for the hypothetical median med where tails = 1 or 2 (default)
In addition, the Signed-Ranks Table of critical values has been revised and expanded on the website, although this table is really no longer needed since the SRINV function returns a wide range of critical values, including all the values from the table. Also SRDIST and SRINV should now be used instead of SRankPROB and SRankCRIT.
Note that the SRANKPair_EXACT, SRANK_EXACT, SRankCRIT and SRankPROB functions are still available for backwards compatibility, but they will no longer be supported or included in the reference materials on the website. The values output by SRANKPair_EXACT and SRANK_EXACT will now be calculated using SR_EXACT. Similarly, the values output by SRankCRIT and SRankPROB will now be calculated using SRDIST and SRINV.
Signed-Ranks Simulation
The following new function creates a simulated version of the exact test. This function can be useful for very large samples where the exact test can’t be used or where there are many ties.
SR_SIMUL(R1, R2, lab, iter): returns a column array with the T-stat for R1 and R2 (i.e. T = SRANK(R1, R2)), the p-value for the left tailed test, the p-value for the right tailed test and the p-value for the two-tailed test; if lab = TRUE (default FALSE) then an extra column of labels is appended to the output; iter = # of iterations in the simulation (default 10,000).
SR_SIMUL(R1, med, lab, iter): returns a column array with the T-stat for R1 and med (i.e. T = SRANK(R1, med)), the p-value for the left tailed test, the p-value for the right tailed test and the p-value for the two-tailed test; if lab = TRUE (default FALSE) then an extra column of labels is appended to the output; iter = # of iterations in the simulation (default 10,000).
Other Signed-Ranks Test Changes and Improvements
The SRANK function has been enhanced to support both the paired samples signed-ranks test (when R1 contains two columns or when R2 is present) and the one sample signed-ranks test (when R1 contains one column and the second argument has a numeric value).
SRANK(R1, R2) = T-stat for the paired sample Signed-Ranks test for the samples in the one-column ranges R1 and R2
SRANK(R1) = T-stat for the paired sample Signed-Ranks test for the samples in the two-column range R1
SRANK(R1, med) = T-stat for the one sample Signed-Ranks test for the sample in the one-column range R1 based on a hypothetical median med (default 0)
The SRTEST function has been enhanced to support both the one sample and paired sample versions of the signed-ranks test, using the same approach as for SRANK. In addition, SRTEST now supports ties and a continuity correction.
SRTEST(R1, R2, tails, ties, cont) = p-value of the paired sample Signed-Ranks test for the samples in ranges R1 and R2 using the normal approximation. tails = 1 or 2 (default). If ties = TRUE (default) the ties correction is applied. If cont = TRUE (default) a continuity correction is applied. When R1 contains two columns, this function should take the format SRTEST(R1,,tails, ties, cont).
SRTEST(R1, med, tails, ties, cont) = p-value of the one sample Signed-Ranks test for the sample in ranges R1 based on the hypothetical median med (default 0) using the normal approximation. tails = 1 or 2 (default). If ties = TRUE (default) the ties correction is applied. If cont = TRUE (default) a continuity correction is applied.
Note that SRANK can now be used in place of SRankPair. Similarly, SRTEST can now be used in place of the SRTestPair function. SRANKPair and SRTestPair are still available for backwards compatibility, although these functions will no longer be supported or included in the reference materials on the website.
The SR_TEST array function, which already supported both the one sample and paired signed-ranks test, has been revised as follows:
SR_TEST(R1, R2, lab, tails, tails, ties, cont, exact, iter): returns a column array with the values T-stat, z-stat, r effect size and three types of p-values for the paired samples contained in ranges R1 and R2 using the normal approximation, exact test and simulation. tails = 1 or 2 (default). For the normal approximation, if ties = TRUE (default) the ties correction factor is applied, while if cont = TRUE (default) a continuity correction is applied. If exact = TRUE (default FALSE) then the p-value of the exact test is output and if iter ≠ 0 then the p-value of the simulation version of the test is output where the simulation consists of iter samples (default 10,000). If lab = TRUE (default FALSE) then an extra column of labels is appended to the output.
SR_TEST(R1,, lab, tails, tails, ties, cont, exact, iter): the paired samples version of the SR_TEST when R2 contains two columns.
SR_TEST(R1, med, lab, tails, tails, ties, cont, exact, iter): the one-sample version of the SR_TEST function
The T Tests and Nonparametric Equivalents data analysis tool has also been revised so that when the One sample or Two-paired samples and Non-parametric options are chosen (representing the Signed-Ranks test), the output is similar to that obtained from the SR_TEST function. In particular, the p-values of the normal approximation, exact test and simulation test are output.
Studentized Range Q
The table of critical values for the studentized range q distribution has been expanded to support up to 100 groups. The QCRIT(k, df, alpha, tails, interp) function has also been enhanced to accept values of k from 2 to 100. Finally, the following new function has been added:
QPROB(q, k, df, tails, iter, interp) = estimated p-value for the studentized range q distribution at q for the distribution with k groups, degrees of freedom df, tails = 1 or 2 (default), iter = the number of iterations used in calculating the p-value from the table of critical values and interp = TRUE (default) for recommended interpolation and FALSE (linear interpolation)
ARIMA Enhancement
When the Use Solver option is chosen from the ARIMA Model and Forecast data analysis tool, standard errors are now calculated for the coefficients along with a t test to determine whether these coefficients are statistically significant. This approach uses the following new array function:
ARIMA_SE(R1, p, q, d, R2): returns a column array with the standard errors that correspond to the ARIMA(p, q, d) model coefficients in the column array R2 based on the time series data in R1
Note that the function won’t always converge to a solution for the standard errors.
Color Support
The following functions allow you to test whether cells in your spreadsheet have a specified font or background (i.e. fill) color or are bold.
- FillRGB(R1) returns the RGB background color value of the cells in range R1 (if the background colors of the cells differ then zero is returned).
- FontRGB(R1) returns the RGB font color value of the cells in range R1 (if the font colors of the cells differ then zero is returned).
- RedRGB(n) returns the value (0-255) of the red component of the RGB value n
- GreenRGB(n) returns the value (0-255) of the green component of the RGB value n
- BlueRGB(n) returns the value (0-255) of the blue component of the RGB value n
- RGBCode(text) returns the RGB value of the color described by text
- FillColor(R1) returns the fill color as text
- FontColor(R1) returns the font color as text
- IsBold(R1) returns the value TRUE if all the cells in range R1 are bold and FALSE otherwise.
If FillRGB(“A1”) = 16734822 = 102 + 90*256 + 255*256^2, then RedRGB(FillRGB(“A1”)) = 102, GreenRGB(FillRGB(“A1”)) = 90 and BlueRGB(FillRGB(“A1”)) = 255.
Currently, the text argument in the RGBCode function (as well as the output from the FillColor and FontColor functions) can take any of the following 24 values: “white”, “black”, “red”, “green”, “blue”, “yellow”, “orange”, “brown”, “purple”, “cyan”, “magenta”, “salmon”, “beige”, “teal”, “gray”, “silver”, “gold”, “maroon”, “tan”, “turquoise”, “lime”, “navy”, “olive” and “pink”.
Note that RGBCode(“black”) = 0, RGBCode(“red”) = 255, RGBCode(“yellow”) = 255 + 256*255 = 65535, etc.
These functions enable you to test whether cells have a particular color (or are bold). For example, FillColor(A1) = “red” returns TRUE if cell A1 is filled with red color. FillRGB(A1) = FillRGB(A2) returns TRUE if cells A1 and A2 contain the same color (even if not one of the 24 named colors).
In addition, there is a new Color Assignment data analysis tool that allows you to fill a range with any of the 24 designated colors.
Minor Enhancements
- The MERGE(R1, R2) function has now been enhanced to allow R1 and/or R2 to be row arrays or ranges.
- A new RANKS(R1, order, num_digits) array function has been added that calculates RANK_AVG(x, order, num_digits) for each element x in R1
Bug Fixes
- Fixed the error that resulted when using the ARIMA Model and Forecast data analysis tool when the AR order is set to zero (i.e. MA-only model)
- Fixed a bug in the Time Series Tests data analysis tool when the ADF test option is selected. The last two rows of the output, containing the coefficient and p-value, are now correctly displayed (part of the output from the ADFTEST function)
- Fixed an error in the application of the Bonferroni correction in Dunn’s test which can be used after the Kruskal-Wallis test (an option on the Single Factor ANOVA data analysis tool).
- Fixed an error in the display of the coefficient table in the output of the Weighted Linear Regression data analysis tool when there is one independent variable and no constant term.