I am pleased to announce Release 5.6 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2013 and 2016 (Windows) environments. Release 5.6 will be available for Excel 2011 and Excel 2016 for Mac by tomorrow.
The various examples workbooks have also been updated for compatibility with the new release. The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities.
If you are getting value from the Real Statistics website or software, I would appreciate your donations to help offset the costs of the website by going to Please Donate.
The following is a summary of the new features in Release 5.6. The first few new features provide additional support for missing data and unbalanced models.
Chi-square Test for Independence with missing data
A version of the chi-square test for independence has now been included based on the EM algorithm which deals with missing data. Missing row and column data are imputed and a chi-square test based on the maximum likelihood function is performed. The following new functions have been added:
EM_CHISQ(R1, iter, prec): outputs an array with the estimated multinomial p (i.e. probability) parameters
EM_CHISQ_IMPUTE(R1, iter, prec): outputs an array with imputed data values
EM_CHISQ_EXP(R1, iter, prec): outputs an array with the estimated multinomial p parameters assuming independence
EM_CHISQ_EXP_IMPUTE(R1, iter, prec): outputs an array with imputed data values assuming independence
EM_CHISQ_TEST(R1, lab, iter, prec): outputs an array containing the chi-square statistic, df and p-value of the test for independence; if lab = TRUE (default = FALSE), then a column of labels is appended to the output.
iter = the maximum number of iterations (default 200). If none of the imputed values change by more than prec (default 0.00000001) then the iteration terminates.
The Real Statistic Chi-square Test for Independence data analysis tool has also been enhanced by the inclusion of a Contains missing data option. This supports the chi-square test of independence for data in either Excel or Standard (i.e. stacked) format even when there is missing data.
Multivariate Normal Data with Missing Values
Given an array of data which follows a multivariate normal distribution (i.e. the extension of a normal distribution to more than one dimension), any missing values are imputed using the EM algorithm and an estimate of the covariance matrix of the data as well as the means of each column in the input is produced. The following new functions have been added:
EM_MNORM_IMPUTE(R1, iter): outputs an array containing the data in R1 but with any missing data elements imputed.
EM_MNORM(R1, iter): outputs an array containing the covariance matrix and mean vector for the data in R1 where any missing data elements have been imputed.
A new EM for Multivariate Normal Data data analysis tool has been added (to the Misc tab) which implements these capabilities.
Randomized Complete Block Design (RCBD) with One Missing Data Element
A new process has been implemented whereby a RCBD can be performed even when one data element is missing. This is done by imputing the value of the missing element and then revising the ANOVA that completes the analysis. The following functions have been added:
RCBDMissing(R1): outputs an array identical to R1 except that when one cell is non-numeric (representing a missing data value), then that cell is replaced by an imputed value
RCBDAdjSS(R1, b) = the adjusted SS value for rows (i.e. Blocks) if b = TRUE or for columns (i.e. Groups) if b = FALSE
The Randomized Complete Block Anova data analysis tool has been revised to support this approach when one data element is missing.
Note that the approach (as well as the new functions and revisions to the data analysis tool) can also be used in place of the Two Factor ANOVA without Replication and Repeated Measures ANOVA data analysis tools when there is one missing data element.
Randomized Complete Block Design (RCBD) using Regression
A new process has been implemented whereby regression can be used for RCBD analysis. This approach is especially useful in unbalanced designs; i.e. where there is one or more missing data elements. The following new function has been added.
SS_RCBD(R1, std): outputs a column array with the values SSBlock, SSGroups and SSError for the data in range R1 based on regression; if std = TRUE then R1 is assumed to be in stacked format and when std = FALSE (default) R1 is assumed to be in Excel format
The Randomized Complete Block Anova data analysis tool has been revised with the inclusion of a new Regression option to support the procedure described above.
Note that this approach (as well as the new functions and revisions to the data analysis tool) can also be used in place of the Two Factor ANOVA without Replication and Repeated Measures ANOVA data analysis tools when there is one or more missing data elements.
Dot Plots
A new Dot Plot data analysis tool has been added as a new option to the Descriptive Statistics and Normality data analysis tool.
A dot plot is somewhat similar to a box plot, except that instead of summarizing the data in each group the actual data values are plotted.
Enhanced Breusch-Godfrey (BG) Test
The Bruesch-Godfrey test has been enhanced by adding an option to use an F test based on a modified Lagrange multiplier statistic (LM*). This option is used when the chi = FALSE (default chi = TRUE).
BGSTAT(R1, R2, p, chi) = the Breusch-Godfrey statistic for the X data in R1 and Y data in R2 for order p.
BGTEST(R1, R2, p, chi) = the p-value of the Breusch-Godfrey test for the X data in R1 and Y data in R2 for order p.
Cochrane-Orcutt Regression
Cochrane-Orcutt Regression is another approach for dealing with first order autoregression. The following functions have been added to support this method.
CO_RHO(R1, R2, iter, prec) = the value of rho calculated using the Cochrane-Orcutt method based on iter (default 1000) iterations unless the change in the rho value is less than prec (default .0001), at which point the process stops.
COCoeff(R1, R2, rho): an array function which produces a k+1 × 2 array where column 1 contains the regression coefficients based on rho and column 2 contains the corresponding standard errors. If rho is omitted then rho defaults to 1 – d/2 where d = the Durbin-Watson statistic.
CO_Coeff(R1, R2, iter, prec) = COCoeff(R1, R2, CO_RHO(R1, R2, iter, prec))
Coefficient of Variation Tests
The following new functions have been added to carry out Coefficient of Variation (CV) testing.
CVTEST(R1, lab, alpha): returns an array with the sample CV, the unbiased CV, standard error, p-value and the lower and upper bounds of the 1-alpha confidence interval for the one sample CV test based on the data in R1.
CV2TEST(R1, R2, lab, alpha): returns an array with the CV for sample 1, the CV for sample 2, the pooled CV, the z-statistic, p-value and the lower and upper bounds of the 1-alpha confidence interval for the two sample CV test based on the data in R1 and R2.
alpha = the significance level (default .05). If lab = TRUE (default FALSE) then a column of labels is appended to the output.
Partial and semi-partial correlation coefficients
The following new functions have been added that calculate the partial and semi-partial correlation coefficient based on Pearson’s and Kendall’s correlation.
PART_CORREL(Rz, Rx, Ry) = partial correlation rzx,y of variables z and x holding y constant based on Pearson’s correlation
PART_KCORREL(Rz, Rx, Ry) = partial correlation τzx,y of variables z and x holding y constant based on Kendall’s correlation
SEMIPART_CORREL(Rz, Rx, Ry) = semi-partial correlation rz(x,y) based on Pearson’s correlation
SEMIPART_KCORREL(Rz, Rx, Ry) = semi-partial correlation τz(x,y) based on Kendall’s correlation
Other Enhancements
- Expanded the internal table of critical values for the Mann-Whitney test. This results in more accuracy for the various Mann-Whitney functions and analysis tools. The table now contains values for n = 2 to 40 for one sample and 2 to 20 for the other.
- Increased the maximum order of AR and MA in the ARIMA data analysis tool from 20 to 50
- Added the Alpha value to the output of the Multinomial Logistic Regression data analysis tool. In this way it is easier to change the significance level.
- Added the capability of allowing additional Real Statistics functions to take the output from some other function as input (e.g. RegCov)
- Added the function CharCount(s, c) which counts the number of occurrences of the character c in string s.
- Added a new category Missing for the fx capability to provide help for Real Statistics functions that deal with missing data
- Used F_DIST_RT and F_INV_RT instead of FDIST and FINV in the Mixed Repeated Measures ANOVA data analysis tool for the epsilon corrections to increase the accuracy of the result.
Bug Fixes
- Fixed the Box Plot with Outliers data analysis tool (an option of the Descriptive Statistics and Normality data analysis tool) so that Q1 – Min and Max – Q3 can’t be negative (if negative, the value is reset to zero).
- Fixed some missing values in the Kendall’s Tau and Durbin-Watson tables of critical values. This fixes errors in the DLowerCRIT, DUpperCRIT, TauCRIT and related functions.
- Fixed an error in the Randomized Complete Block Anova data analysis tool when using Standard format with no headings
- Fixed a bug when the Stepwise Regression option of the Multiple Regression data analysis tool is chosen.
- Corrected some errors when inserting Real Statistics functions using the fx capability
- Fixed a bug in the DescStats function which occurs when some column only contains one data element (and so the standard deviation generates a division by zero error)
- Fixed a bug in the MissingPatterns function which gave the wrong value for the frequency percentages (caused by division by one less than the correct value)
- Fixed a bug in the PolyDeg function which sometimes caused the wrong polynomial degree to be returned.
- Fixed a small error in the calculation of the standard error for the lower and upper limits in the output from the Bland Altman Plot data analysis tool
Dr good morning, thank you very much, for the new version.
Have you thought about implementing the probability density graphs of the data?
Thank you, again
Dr buenos días, muchas gracias, por la nueva versión.
Ha pensado implementar las gráficas de densidad de probabilidad de los datos?
Muchas gracias, nuevamente
Gerardo,
Can you send me an example of such a graph, preferably in Excel?
Charles