I am pleased to announce Release 3.2 of the Real Statistics Resource Pack. The new release has a lot of new capabilities and is now available for free download (Download Resource Pack) for Excel 2007, 2010 and 2013 environments.
The Real Statistics Examples Workbook has been updated to reflect the new release. You can also download this file for free (Download Examples). The website is in the process of being updated to reflect the new features. These changes will be made over the course of the next few days.
The focus of this release is on new Analysis of Variance capabilities and new non-parametric test capabilities and improvements. Specific new features include:
New ANOVA data analysis tools
All the existing ANOVA data analysis tools are now consolidated under one menu choice called Analysis of Variance. In addition some new tests have been added. The new Analysis of Variance choice provides access to the following tests:
- One Factor Anova – now both fixed and random factors are supported
- Two Factor Anova – in addition to two fixed factors, this data analysis tool now supports two random factors and mixed factors
- Three Factor Anova – three fixed factors, as in the past
- Nested Anova – new data analysis tool for two nested factors: random, fixed and mixed
- One Factor Repeated Measures Anova – the tool has been simplified
- Two Factor Repeated Measures Anova – new data analysis tool which supports one fixed factor and one repeated measures factor
- MANOVA – in addition to the one sample MANOVA data analysis tool, support is now provided for repeated measure analysis (an alternative to the Two Factor Repeated Measures Anova tool but with fewer assumptions)
- ANCOVA – new data analysis tool supporting one factor analysis of covariance
The One Factor ANOVA data analysis tool now provides three non-parametric tests which can be used when the assumptions for ANOVA are not met:
- Welch’s Test (new, see below)
- Brown-Forsythe (new, see below)
- Kruskal-Wallis (pre-existing).
The Nested Anova data analysis tool uses the following new array function to convert data in standard (stacked) format to the format needed by the data analysis tool:
- StdNested(R1) – takes data in range R1 in standard format (w/o headings) and outputs a range in Excel Two Factor Nested Anova format (with headings)
A bug has also been fixed in the Three Factor Anova data analysis tool which blocked analysis for unbalanced data in Standard Rows format (i.e. where blank cells were present). This impediment has now been removed.
New Hotelling’s T-square data analysis tool
The new data analysis tool provides the following features:
- Multivariate version of the t-tests for one sample, two paired samples and two independent samples
- Includes all the functionality which previously was only accessible via worksheet functions
- Adds repeated measures support (an alternative to the One Factor Repeated Measures ANOVA data analysis tool but with fewer assumptions)
New Hotelling’s T-square array function
- Hotelling(R1, R2, type, lab) – outputs a column range with the values T-square, df1, df2, F and p-value for Hotelling T2 test for the data in ranges R1 and R2.
Here type and lab are as follows:
- type = 0: one sample, type = 1: paired sample, type = 2: independent samples with equal covariance matrices, type = 3: independent samples with unequal covariance
- lab = TRUE – a column of labels is added to the output, lab = FALSE – no labels are added
New Non-parametric tests
The following new array functions have been added.
- WELCH_TEST(R1, lab) – outputs a column range with the values F, df1, df2, and p-value for Welch’s test for the data in range R1. This test can be used instead of Two-Factor Anova when the assumptions are not met.
- FSTAR_TEST(R1, lab) – outputs a column range with the values F, df1, df2, and p-value for Brown-Forsythe’s test for the data in range R1. This test can be used instead of Two-Factor Anova when the assumptions are not met.
Improved Handling of Ties
The following new function is now available:
- TiesCorrection(R1, R2, type) = ties correction value for the data in range R1 and optionally range R2, where type = 0: one sample, type = 1: paired sample, type = 2: independent samples
The following functions have been modified to provide an option whereby the statistic can be calculated using a correction for ties:
- MANN_TEST(R1, R2, lab, tails, alpha, ties) – Mann-Whitney test
- SRANK_TEST(R1, R2, lab, tails, alpha, ties) – Wilcoxon Signed Ranks tests (one-sample and paired samples)
- KendallW(R1, lab, ties) – Kendall’s W
In addition, the T Test and Non-parametric Equivalents data analysis tool has been modified to include the option for correcting for ties in the non-parametric tests. (There is also an option for using an exact test based on table look-up for the non-parametric option.)
The Reliability data analysis tool has also been modified to include an option for using the ties correction factor with Kendall’s W.
New exact version of Wilcoxon’s Signed Ranks Test
The previously available versions of Wilcoxon’s Signed Ranks Test were based either on a table look-up or a normal distribution approaximation. In addtion to these, there are two new functions which calculate exact p-values and critical values for small samples based on the permutation distribution.
- PERMDIST(x, n, cum) = value of the permutation distribution at x based on n elements; returns the pdf value at x if cum = FALSE and the cdf value if cum = TRUE
- PERMINV(p, n) = inverse of the permuation distribution at p; i.e. the least value of x such that PERMDIST(x, n, TRUE) ≥ p
The p-value of Wilcoxon’s Signed Ranks one-tail test for test statistic T and sample size n is given by PERMDIST(T, n, TRUE). The critical value for any value of alpha is PERMINV(alpha, n). The two tail test is given by 2* PERMDIST(T, n, TRUE) and PERMINV(alpha/2, n).
New Runs Test functions
New functions have been added to carry out the runs test using both the normal approximation and an exact result (based on combinatorial functions)
- RUNSTEST(s, lab, tails) – outputs a column range with the results of the runs test for the data in string s
- RUNSTEST(R1, lab, tails) – outputs a column range with the results of the runs test for the data in range R1
In the first version of the function s takes a form such as e.g. s = “TTFFFTFTTTTFTTF” and the function determines whether the number of runs in s (i.e. 8 in this case) is random. In the second version R1 contains numbers and the function determines whether the these numbers are randomly larger/smaller than the median of the numbers.
In addition there are the following new functions which are used to carry out the exact tests:
- RUNSDIST(r, n1, n2, cum) = the probability of getting r runs from a string with n1 T’s and n2 F’s if cum = FALSE (i.e. the pdf at r) and the probability of getting at most r runs from a string of n1 T’s and n2 F’s if cum = TRUE (i.e. the cdf at r)
- RUNSINV(p, n1, n2) = the inverse of RUNSDIST; i.e. the least value of r such that RUNSDIST(r, n1, n2, TRUE) ≥ p
The RLowerCRIT and RUpperCRIT functions have been revised. Previously they were based on the Runs Table and so were restricted to alpha = .05 and strings of at most 40 characters. They now use the RUNSINV function and so can take any value of alpha and strings up to about 1,000 characters (actually up to 514 T’s and 514 F’s).
New Shapiro-Wilk test capability
The SWPROB function has been enhanced so that a more accurate p-value can be calculated using the Royston algorithm when the value of the W statistic is known.
- SWPROB(n , w, b) = p-value for the Shapiro-Wilk test for a sample of size n and statistic w; if b = TRUE then use the Royston algorithm is used, while if b = FALSE then a table look-up is used.
In addition, a bug has been fixed in the SWPROB function when alpha = .01, which gave an error value instead of the correct value from the statistics table.
New sphericity correction functions
- GGEpsilon(R1, ngroups, raw) = Greenhouse and Geisser epsilon value for the data in range R1 where ngroups = the number of groups; if raw = TRUE then R1 contains raw data, otherwise it contains a covariance matrix
- HFEpsilon(R1, ngroups, nsubj) = Huynh and Feldt epsilon value for the data in range R1 where ngroups = the number of groups; if nsubj = 0 then R1 contains raw data, otherwise it contains a covariance matrix which is derived from raw data with nsubj subjects (corresponding to rows).
Miscellaneous new array functions
- COUNTROW(R1): outputs a column range with the counts of each of the rows in R1
- MEANROW(R1): outputs a column range with the means of each of the rows in R1
- VARROW(R1): outputs a column range with the sample variances of each of the rows in R1
- STDEVROW(R1): outputs a column range with the sample standard deviations of each of the rows in R1
Hi, Charles.
First of all, congratulations for your application, it is very amazing.
I am used to run analysis for Agronomic Trials with “Randomized BLOCK design”. Is there a way to that with RealStats?
Thanks for your time and patience.
Caio,
Glad you like the application.
These will generally correspond to ANOVA with fixed factors (with or without repetition) or ANOVA with repeated measures.
Charles