Real Statistics Release 4.10 | Real Statistics Using Excel

I am pleased to announce Release 4.10 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

I apologize to all you Mac users. I expect to turn my attention to creating a new Mac version of the Real Statistics Resource Pack shortly.

I want to thank all of you who have identified bugs or have suggested enhancements. I have tried to include fixes for all bugs that have been identified and support for at least some of the suggested enhancements.

The spreadsheets for all the examples used on the Real Statistics website are now available for free download (Download Examples Workbooks). These spreadsheets are contained in four Excel files (i.e. workbooks): Examples Workbook Parts 1 and 2, Multivariate Examples and Time Series Workbook. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.10.

A focus in this release is on regression enhancements, although other important features have been added as well. Release 4.10 contains the following new features:

Polynomial Regression

A new Polynomial Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that is provided by the new data analysis tool. Here, Rx and Ry are column arrays containing x and y data values and deg is the degree/order of the polynomial

PolyDesign(Rx, deg, ones) – returns an array consisting of x, x², …, x^deg columns. If ones = TRUE, then the output is 1, x, x², …, x^deg

PolyCoeff(Rx, Ry, deg) – returns a column array consisting of the polynomial regression coefficients and their standard errors

PolyRSquare(Rx, Ry, deg) = R-square value for the polynomial regression

PolyDeg(Rx, Ry, maxdeg) = the highest degree polynomial ≤ maxdeg which produces a significantly different R-square value

Least Absolute Deviation (LAD) Regression

A new Least Absolute Deviation Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that provided by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and iter is the number of iterations used in the iteratively reweighted least squares algorithm (default = 25).

LADRegCoeff(Rx, Ry, iter) = k × 2 range consisting of the regression coefficient vector followed by vector of standard errors of these coefficients

LADRegWeights(Rx, Ry, iter) = n × 1 column range consisting of the weights calculated from the iteratively reweighted least squares algorithm

Note, that in addition to describing the iteratively reweighted least squares algorithm, the website will also describe the Simplex method for calculating the LAD regression coefficients.

New Extracting Columns from a Data Range Data Analysis Tool

The existing Extracting Columns from a Data Range data analysis tool has been completely revised. In addition to more easily selecting which columns you want to retain from a data range, you will now have the option to create (1) tag/dummy or categorical codes for selected columns, (2) interactions between the variables (e.g. xy) representing selected columns and (3) powers of variables in selected columns (x², x³, etc.).

Simplifications to Regression Data Analysis Tools

The Multiple Regression data analysis tool has been simplified by the elimination of the Tag/dummy coding options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

The Logistic Regression data analysis tool has also been simplified by the elimination of the Categorical coding and the Deletion of variables options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

Stepwise Regression Capabilities

A Stepwise Regression option has been added to the Multiple Regression data analysis tool. When this option is selected an automatic selection of a subset of variables is made that produces a regression model that fits the data which is in some sense similar to that of the full regression model containing all the variables.

The output from this data analysis tool shows how the stepwise selection of variable was made along with the regression analysis using these variables.

In addition, the following new functions are supported which are used by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and Rv is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise.

RegRank(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by Rx, Ry and Rv.

RegCoeffP(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by Rx, Ry and Rv.

RegStepwise(Rx, Ry) – returns a 1 × k array Rv where each non-blank elements in Rv corresponds to an x variable that should be retained in the stepwise regression model. Actually the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model.

Optimize Time Series Forecasting

An Optimize MSE option has been added to the Basic Forecasting data analysis tool. When this option is chosen, values of Alpha, Beta and Gamma are found which minimize the squared error for the Simple Exponential Smoothing, Holt’s Linear Trend or Holt-Winter’s Method.

Changes to the Augmented Dickey-Fuller Test

The output from the Augmented Dickey-Fuller (ADF) unit root test function ADFTEST has been augmented with two additional values, namely the first-order autocorrelation coefficient and an estimated p-value. This is also the case for the ADF Test option of the Time Series Testing data analysis tool.

The ADFCRIT(n, alpha, type) function has been revised to deal with alpha values besides .01, .025, .05 and .1. This is accomplished by linear interpolation. In addition, the following new function has been added:

ADFPROB(x, n, type) = estimated p-value (based on linear interpolation) for the ADF test at x for a time series of length n where type is as for ADFCRIT.

New Unit Root Tests for Time Series Analysis

The PP and KPSS tests for a unit root in time series analysis are now supported via the following new array functions:

PPTEST(R1, lab, lags, type, alpha) – an array function which returns a column range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags, autocorrelation coefficient and p-value.

KPSSTEST(R1, lab, lags, type, alpha) – an array function which returns a column range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags and p-value.

Thanks to Milos Cipovic who wrote the software for these tests.

New Features in the ANCOVA Data Analysis Tool

The ANCOVA data analysis tool has been enhanced with the following new options: ability to use data in stacked format and support for contrasts, Tukey’s HSD test and Tukey-Kramer test.

Diversity Indices Data Analysis Tool

The new Diversity Indices data analysis tool calculates Shannon’s, Simpson’s and Brillouin’s diversity indices for categorical data.

Function Categories

When you click on the Insert Function symbol f_x next to the Formula toolbar in Excel, you can access a list of all the standard Excel functions along with a brief description of each function and that function’s arguments. This includes all the Real Statistics functions.

The standard Excel functions are split into different categories (Math & Trig, Logical, Text, Statistical, etc.) which makes it easier to find these functions. The Real Statistics functions, however, have all been placed in the User Defined category and very limited information has been available about the functions or their arguments.

I am in the process of adding some new categories for the Real Statistics functions. I am also adding more descriptive information about these functions and their arguments. So far I have added Regression and Distribution categories and additional descriptive information about the functions in these categories. Additional categories and descriptive information will be added in coming releases.

The new categories and descriptive information are only available after you have accessed the Real Statistics data analysis tools (via Ctrl-m or the equivalent) for the first time.

This new capability is not available for versions of the Real Statistics Resource Pack that run on Excel 2007.

New Regression Function

RegPredC(Rx, Rc) = predicted y value for x values in range Rx based on the regression coefficients in range Rc. Rx and Rc can either be column or rows ranges.

New ANOVA Functions

The following functions compute the values SS_Bet, SS_W and SS_Tot for one-way ANOVA when the data is in stacked format. These are similar to the corresponding pre-existing functions used when data is in Excel format.

In the following, it is assumed that the first column of the input R1 contains the names of the factor levels and column number col contains the data for the one-way ANOVA.

SSWStd(R1, col) = SS_W

SSBetStd(R1, col) = SS_Bet

SSTotStd(R1, col) = SS_Tot

Bug Fixes and Minor Changes

Corrects an error in the two-tailed Mann-Whitney exact test. Previously, if the p-value for the one-tailed test was greater than .5, the p-value for the two-tailed test was given as 2*p instead of 2*(1–p).

Fixes an error in Contrasts for Single Factor ANOVA data analysis tool. Previously, the formula used to determine whether the test was significant (i.e. the Sig cell) referenced the wrong cell when calculating the value of Alpha.

Fixes an error when using the Cutoff % option in the Logistic Regression data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and 1 (instead of between 0 and 100).

Fixes an error when using the Cutoff % option in the Reliability data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and .5 (instead of between 0 and 50).

Fixes an error on the constraint for the gamma parameter in the Holt-Winter Method option of the Basic Forecasting data analysis tool.

Fixes an error that sometimes occurs in the calculation of the log-rank metrics in the Hypothesis Testing portion of the output when the Kaplan-Meier option of the Survival Analysis data analysis tool is used.

Corrects errors in some labels and tooltips in various dialog boxes.

Allows the MANOVA functions MANOVA_PillaiTrace, MANOVA_WilksLambda, MANOVA_HotelTrace and MANOVA_RoyRoot to work properly even when there is only one dependent variable.