Real Statistics Release 8.4

The Real Statistics website will be modified in the next few days to explain the new capabilities in more detail.

The following is an overview of the new features in Release 8.4.

Bedford Distribution

Benford’s distribution specifies that the probability of the first significant digit x1 of a numeric value x will be d (d = 1, 2, …, 9)

P(x1 = d) = log10(1 + 1/d)

Numerical values that follow Benford’s distribution are said to obey Benford’s law. It turns out that numeric values in many real-world situations obey this law. E.g street addresses and values in a tax return. It is often applied to fraud detection.

The following worksheet functions have been added. See Bedford Distribution Fitting Support for more details.

FIRST_SIG(x) = first significant digit of the positive numeric value x.

KS_BSTAT(R1) = the Kolmogorov-Smirnov statistic for the data in the column array or range R1 that is suspected of following a Benford distribution

KS_BCRIT(n, alpha, interp) = the critical value of the KS distribution for a sample that follows the Benford distribution of size n and significant level alpha (from .01 to .10, default .05)

KS_BPROB(x, n, iter, interp, txt) = an approximate p-value for the KS test at x based on a sample of size n

KS_BTEST(R1, lab, alpha): returns a column array containing the KS statistic, p-value, and critical value for the KS test on the data in the column array or range R1 that is suspected of following a Benford distribution

AD_BSTAT(R1) = the Anderson-Darling  statistic for data in the column array or range R1 that is suspected of following a Benford distribution

The existing ADCRIT(n, alpha, dist, , interp), ADPROB(x, dist, n, iter, interp, txt), and ADTEST(R1, dist, labiter, alpha) functions have now been updated to support the Benford distribution.

Trend Analysis

When the independent variable in an ANOVA is numeric, we can also test whether there is a polynomial trend in the effectiveness of the treatment. We investigate linear, quadratic, cubic, etc. trends using orthogonal polynomial contrast coefficients. To support this analysis, the following new worksheet function has been added. 

TREND_ANOVA(R1, R2, sse, dfe, sst, lab): returns the ANOVA analysis using polynomial coefficients based on the group means in R1 and group counts in R2. sse is the error SS, dfe is the error df, and optionally sst, which is the treatment SS. If lab = TRUE (default FALSE), then row and column headings are appended to the output.

See Trend Analysis for more information about this function. The following new worksheet function has also been added. See Calculating Polynomial Contrast Coefficients for more information about this function.

TREND_COEFF(k, lambda): returns the polynomial contrast coefficient matrix for a treatment with k equally spaced groups for k ≥ 2. If lambda = TRUE (default FALSE), then an extra column is appended to the right side of the output with the lambda values.

Goodness-of-fit for Cauchy distribution

The existing CAUCHY_FIT and CAUCHY_FITM functions can be used to estimate the parameters of a Cauchy distribution that best fits some data based on the maximum likelihood or pseudo-method-of-moments approach. The following new worksheet function provides a different, often more accurate, way of estimating the Cauchy distribution parameters.

CAUCHY_FITX(R1, lab): returns a column array with the ordered statistics estimates of the mu and sigma parameters of the Cauchy distribution that best fits the data in the column array or range R1, along with the log-likelihood estimate based on these parameters. If lab = TRUE (default FALSE) then a column of labels is appended to the output.

See Fitting a Cauchy Distribution for more details.

The existing ANDERSON(R1, dist), ADCRIT(n, alpha, dist, , interp), ADPROB(x, dist, n, iter, interp, txt), and ADTEST(R1, dist, labiter, alpha) functions have now been updated to support the Cauchy distribution using all three of the goodness-of-fit methods.

Enhanced Anderson-Darling Support

The existing Anderson-Darling test function, ADTEST(R1, dist, labiter, alpha), allows you to test the fit of the data in R1 to a variety of distributions using different goodness-of-fit techniques. This function has now been enhanced to allow you to test the fit when you explicitly provide the distribution parameters (based on the data using your own distribution fitting approach) via ADTEST(R1, dist, lab, , alpha, param1, param2).

An error in the calculation of the Anderson-Darling statistic for the exponential distribution has also been corrected.

New RESHAPES worksheet function

The following new worksheet function has been added.

RESHAPES(R1, nrows, ncols, bycol): returns an nrows × ncols array with the elements from R1. If nrows (ncols) is missing or 0 then nrows (ncols) is set to the smallest value such that all the elements in R1 are output. If nrows and ncols are both missing or 0 then a column array with all the elements in R1 is returned. If bycol = TRUE (default), then elements are returned in column order; otherwise, they are in row order. If R1 doesn’t contain a sufficient number of elements then #N/A is used as a filler.

Enhanced Wordle support

The existing WordleProb2 worksheet function has been revised to provide more realistic estimates for the probabilities of success, especially in 4 tries. This is done using the output from the following new function.

NPatterns4(guess1, pattern1, guess2) = # of patterns for the 4th guess based on the best 3rd guess (i.e. the guess that maximizes the number of patterns).

Bug-fix

A previously identified bug has been fixed in the One-way ANOVA data analysis tool when the Random Factor option is chosen. In particular, the variance of the mean has been corrected.