I am pleased to announce Release 7.4 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2010, 2013, 2016, 2019 and 365 in both Windows and Mac environments.
The Basics, Regression 1, Distributions, Multivariate examples workbooks are being updated for compatibility with the new release. Over the course of the next few days, the website will also be updated for compatibility with the new release,
If you are getting value from the Real Statistics website or software, I would appreciate your donation to help offset the costs of the website by going to Please Donate.
The following is an overview of the new features in Release 7.4:
Linear Regression with Bounded Coefficients
Real Statistics already supports the case where the intercept coefficient for linear regression is set to zero. With this release, we now support the cases where any of the regression coefficients can be set to a specific value or constrained by a lower and/or upper bound. In particular, the following array function is now available:
BRegCoeff(Rx, Ry, Rlower, Rupper): returns an n+1 × 1 column array with the regression coefficients for a linear regression with X values in the m × n array Rx and Y values in the m × 1 column array Ry, based on the lower bounds for the coefficients in the n+1 × 1 column array Rlower and the upper bounds for the coefficients in the n+1 × 1 column array Rupper.
Here, the first values in Rlower and Rupper are the bounds for the intercept. The first coefficient in the output is also for the intercept.
If you want a coefficient to take a particular value, then set both the lower and upper bounds for the appropriate coefficient to this value. It is assumed that the regression includes a constant term. If you want to exclude a constant term, then set the first element in Rlower and Rupper to zero.
If you only want to use a lower bound for a particular coefficient, then set the upper bound to a very large positive number. Similarly, if you only want to set an upper bound, then set the lower bound for this coefficient to a very large negative number.
Constrained linear combination of regression coefficients
We can now also support the case where any linear combination of regression coefficients can be set to a constant value (e.g. 2b0 – 3b2 = 4). The following array function has been added to support this feature.
CRegCoeff(Rx, Ry, Rc): returns an n+1 × 1 column array with the regression coefficients for linear regression with X values in the m × n array Rx and Y values in the m × 1 column array Ry, based on the weights for the coefficients in the n+2 × 1 column array Rc.
The first entry in Rc is the multiplier of the intercept and the last entry in Rc is the constant. E.g. for the linear regression y = b0 + b1*x1 + b2*x2 subject to the constraint 2b0 – 3b2 = 4, we use the column array which is the transpose of (2, 0, -3, 4).
Lp norm estimation
Real Statistics already supports the Biweight and Huber M-estimates of central tendency. These approaches are intended to reduce the impact of outliers. With this release, another approach is supported based on the Lp norm, namely, for X = (x1, .., xn), the Lp norm is defined as
The Lp distance between X and Y is defined as ||X–Y||p.
The related estimate of central tendency is defined as
where p is a value between 1 and 2. When p = 1, Lp(X) is the median and when p = 2, Lp(X) is the mean. For values of p strictly between 1 and 2, we obtain a measure of central tendency that downgrades the contribution of any outliers without eliminating their contribution entirely, as in the median.
To support this measure of central tendency, the following function has been added:
LpEST(R1, p, iter, guess, check) = Lp(X) where X = the data in the column array R1.
Here, 1 ≤ p ≤ 2, iter = the number of iterations (default 100) used to find the value of y that minimizes ||X–y||p and guess is an optional initial guess of this value. If check = TRUE (default FALSE), then this function returns a 2 × 1 array whose second value should be near zero as a check that the iteration has converged.
LpNORM(R1,R2,p) = the Lp distance between the column arrays R1 and R2
If R2 is a scalar then R2 is treated as a column array all of whose values are this scalar.
Convert letter grade to a number
The following function has been added:
GRADEVAL(grade, pm) = a numeric value between 0 and 4.5 where grade A corresponds to 4, B to 3, C to 2, D to 1 and F to 0. pm (plus or minus) takes a value between 0 and .5, where a plus symbol adds the pm value and a minus symbol subtracts the pm value; pm defaults to 1/3.
E.g. =GRADEVAL(“A-“,.25) returns the value 3.75 (i.e. 4-.25).
Cochran-Armitage Test
Real Statistics now supports the Cochran-Armitage test which determines whether there is a linear trend for proportional data. This is implemented via the following array function:
CATEST(R1, lab, tails): returns a 2 × 1 array consisting of the z-statistic and p-value for the test. R1 is a 2 × k contingency table with 3 rows, the first of which contains numeric labels in increasing order. tails = 1 (default) or 2. if lab = TRUE then an extra column is appended to the output with labels (default FALSE).
Multiple Linear Regression enhancements
An option has been added to the Multiple Linear Regression data analysis tool that enables you to determine whether the residuals are normally distributed. Both the Shapiro-Wilk and d’Agostino Pearson tests are conducted.
When the Cook’s D option is selected, if for any data row either the p-value of the T-test < .05 of Cook’s D >= 1 or the absolute value of DFFITS >= 1 then an asterisk is appended to the output for that row.
If the Cook’s D option is selected then charts of the predicted Y vs residuals as well as predicted Y vs standardized residuals are output.
Hessenberg Decomposition
The following array functions have been added to support Hessenberg decomposition (aka Hessenberg factorization):
HessH(R1, prec): outputs the Hessenberg matrix H which is similar to the square matrix in R1.
HessQ(R1, prec): outputs the array Q for which A = QHQT is a Hessenberg decomposition of A where A is the square matrix in R1.
HESS(R1, prec): outputs an n+n × n array. The first n rows of the output consist of H and the next n rows of the output consist of Q where A = QHQT is a Hessenberg decomposition of the square matrix A in R1.
prec (default 0.000000000001) can be used to specify values considered to be equivalent to zero.
In addition, a Hessenberg factorization option has been added to the Matrix Operations data analysis tool.
Eigenvalues and eigenvectors
Real Statistics supports the eVALUES and eVECTORS functions to identify eigenvalues and eigenvectors for certain symmetric matrices and eVECT to identify eigenvalues and eigenvectors for many other square matrices.
These functions don’t support all square matrices (or even symmetric matrices), although in my experience eVALUES and eVECTORS do work properly on correlation matrices, which is the principal application required by Real Statistics.
With this release, I have added a variety of additional methods for calculating eigenvalues and eigenvectors. These too only work in certain situations, but my objective is to add more possibilities, one of which is likely to provide the required solution. In the future, I expect to replace many of these functions with ones that function properly in a broader variety of cases.
The new functions are named eVAL, eVALSym, eVALSym0, eVALDET, eVALRefine, eVECT0, eVECTSym0, eVECTInv, eVECTCheck
T Test and Mann-Whitney Test with standard data
An option has been added to the T Test and Non-parametric Equivalents data analysis to accept data in standard (stacked) format without headings. This option is only available for the two independent samples case.
Minor enhancements and bug fixes
- Increased the number of lags in a number of the Time Series data analysis tools.
- A check of the results of the QR, Schur, Spectral, SVD and Hessenberg factorization options has been added to the output of the Matrix Operations data analysis tool.
- A bug in the QRFull and QRFullQ functions has been fixed.
- Fixed the formatting on the dialog boxes for some of the data analysis tools on the Mac
Charles:
Thanks for the regression enhancements – a much needed improvement.
The use of the asterisk is a clever solution – as good as conditional formatting.
Mark,
Thanks. This approach also has the advantage that it is responsive to changes in the input data without having to rerun the data analysis tool.
Charles