Release 3.7 of the Real Statistics Resource Pack

I am pleased to announce Release 3.7 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010 and 2013 environments.

The Real Statistics Examples Workbook and Real Statistics Multivariate Examples Workbook have both been updated and can be downloaded for free.

The Real Statistics website is in the process of being updated to reflect the new features. These changes will be made in the next couple of days.

The release contains the following new features:

Non-volatile random number generator

Standard Excel provides the RAND and RANDBETWEEN functions to generate random numbers. These functions are volatile in the sense that every time there is a change to the worksheet their value is recalculated and a different random number is generated.

The Real Statistics Resource Pack now provides the RANDOM function which provides the following capabilities:

RANDOM(a, b, FALSE) = random number between a and b; i.e a non-volatile version of a+(b−a)*RAND()

RANDOM(a, b, TRUE) = random integer between a and b, inclusive; i.e. a non-volatile version of RANDBETWEEN(a, b)

If a is omitted it defaults to 0, if b is omitted it defaults to 1 and if the third argument is omitted it defaults to FALSE.

Copy formula exact

Sometimes it is useful to copy a range containing formulas to a new location without changing the cell references in the formulas. This can now be done using the Data Reformatting data analysis tool.

Lilliefors test for normality

When the population mean and standard deviation for the One Sample Kolmogorov-Smirnov Test for Normality is estimated from the sample mean and standard deviation, the results are not very accurate. The Lilliefors Test corrects the KS Test in such cases, and so provides a much more accurate test for normality. The calculation of the test statistic is the same as for the KS test, but the Lilliefors test uses a different table of critical values.

The Real Statistics Resource Pack provides the following new functions:

LCRIT(n, α, tails) = the critical value of the Lilliefors test for a sample of size n, for the given value of alpha (default .05) and tails = 1 (one tail) or 2 (two tails, default) based on this table.

LPROB(x, n, tails, iter) = an approximate p-value for the Lilliefors test for the D-max value equal to x for a sample of size n and tails = 1 (one tail) or 2 (two tails, default) based on a linear interpolation of the values in the table, using iter number of iterations (default = 40).

The resource pack also calculates approximate values of the Lilliefors distribution (not based on the table of critical values), via the following functions:

LDIST(x, n) = the p-value of the Lilliefors distribution at x for samples of size n

LINV(p, n, iter) = the critical value of the Lilliefors distribution at p for samples of size n; i.e. LINV(p, n, iter) = x where LDIST(x, n) = p; based on iter iterations (default 40)

        LTEST(R1) = D-max for the data in range R1

LTEST can also be used as an array function which outputs a 2 × 1 column range consisting of D-max and p-value (based on the LDIST approximation). The array function LTEST takes an optional second argument lab (default FALSE). When lab = TRUE then an initial column of labels are added to the output.

Categorical variables support for regression and logistic regression

Dummy (aka tag) coding can automatically be employed for categorical variables when using the Linear Regression data analysis tool. Either the ordinary 0, 1 coding or the alternative 0, 1, -1 coding can be used. In particular, alphanumeric data will be converted to numeric data (e.g. Male and Female will be converted to 0 and 1 — if ordinary coding is employed). This is equivalent to the TAGCODE function. In order to make sure that the coding is what you want, the original categorical values are retailed, but you should delete these after you check.

Simple categorical coding can automatically be employed for categorical variables when using the Logistic Regression data analysis tool. In particular, alphanumeric data will be converted to numeric data (e.g. red, green, blue will be converted to 0, 1, 2). This is equivalent to the CATCODE function.

New raw data option for Logistic Regression

The Logistic Regression data analysis tool supports two types of input data: raw and summarized. Data in raw data format is first transformed into summary form and then is analyzed. You now have the option to skip the summary step and go straight to the analysis phrase. This will speed up the analysis and can be useful when there isn’t much advantage to summarizing the data or with large input files.

Eigenvalues and eigenvector capabilities

The eVALUES and eVECTORS functions have been improved so that they converge to an answer more quickly. The same is true for the equivalent capabilities in the Matrix Operations data analysis tool.

An additional row has also been added to the output for the eVECTORS function, which is equivalent to the second row in the output from the eVALUES functions. This row is optional, and you won’t see it unless you highlight an additional row in the output.

Note that these functions were originally designed to find the eigenvalues and eigenvectors only for symmetric, invertible square matrices, although they do provide eigenvalues and eigenvectors for many non-symmetric matrices as well. With this release, they also provide eigenvalues and eigenvectors for non-invertible (aka singular) symmetric square matrices.

Remove error cells

The following array function has been added:

DELErr(R1): outputs a range identical to R1 except that all cells with an error value (#NA, #DIV/0!, etc.) are replaced by the empty value “”

This function is very useful when you want to perform an operation on a range with error values. E.g. if R1 contains the values 40, “A”, 50, #NA, 0, then AVERAGE(R1) will take the error value #NA. If instead you use the formula AVERAGE(DELErr(R1)) the value outputted will be 30, which the result obtained by ignoring the #NA (and “A”).

Critical values for Pearson’s correlation coefficient

PCRIT(n, α, tails) = the critical value for the Pearson’s correlation coefficient based on two samples of size n with given value of alpha (default .05) , tails = 1 or 2 (default)

Minor changes and bug fixes:

  • Corrects the two-tailed Fisher Exact Test to make sure that the FISHERTEST  function never outputs a value greater than 1
  • The spelling of the word trapezoid has been corrected in the ROC Curve and Classification Table data analysis tool.
  • The Cancel button for the Two sample t test option of the Statistical Power and Sample Size data analysis tool is now working properly
  • The headings on the output of the Ancova option of the Analysis of Variables data analysis tool now displays correctly when the output is sent to a new worksheet
  • Additional error checking has been added to the dialog box for the Logistic Regression data analysis tool
  • The value for the variance of the population mean for One Random Factor ANOVA data analysis tool has been corrected
  • A minor change has been made to the values produced by NT_DIST(t, df, ncp, FALSE) for very small values of t (t < 1.0E-.07).

New Tables

The following tables have been added:

  • Lilliefors Table of Critical Values
  • Pearson’s Correlation Table of Critical Values

Advice

If you are using Windows XP with Excel 2007 and can’t get the Real Statistics Resource Pack to work properly, try using the version for Excel 2003.

You can configure the Quick Access Toolbar to provide access to the Real Statistics data analysis tools, as described on the webpage Accessing Real Statistics Data Analysis Tools

A number of people have reported that some data analysis tools freeze or crash for no apparent reason. The usual problem is related to the language that Excel is using. In some languages (e.g. French, Italian, etc.) decimals are represented using a comma instead of a period (as in English): 0,05 instead of 0.05. Although I have made an attempt to take this into account, in some versions of Excel this still causes an error when the default value of Alpha is used (or the default value of the Classification Cutoff is used in the Logistic Regression tool). If you are experiencing this problem, you simply need to enter a value yourself instead of using the default value (even when the value looks identical).