Real Statistics Release 8.0

This release focuses on panel data models and new repeated measures ANOVA data analysis tools. It also expands on the lambda-like function capabilities introduced in the previous two releases.

Over the course of the next several days, the website and examples workbooks will be updated for compatibility with the new release.

I want to thank everyone who has made suggestions or has identified errors in the website or software. I appreciate your help in improving the utility and accuracy of Real Statistics.

I also want to express my appreciation to all of you who have donated to Real Statistics. These donations help to offset the costs of maintaining the website. If you are getting value from the Real Statistics website or software, I would appreciate your donation by going to Please Donate.

The following is an overview of the new features in Release 8.0.

Repeated Measures Anova

The following two new data analysis tools have been added:

  • Two-way Repeated Measures Anova
  • Mixed Three-way Repeated Measures Anova
    • two within-subjects factors and one between-subjects factor
    • one within-subjects factor and two between-subjects factors

The first of these performs Repeated Measures ANOVA with two within-subjects factors and no between-subjects factors. The second of these supports the two options listed above.

The existing Repeated Measures ANOVA data analysis tool that supports one within-subjects factor and one between-subjects factor is now called Mixed Two-way Repeated Measures Anova.

Panel Data Models

A new Panel Analysis data analysis tool has been added to the Real Statistics Resource Pack. This tool can be found on the Time Series tab of the Real Statistics user interface.

Four options are supported:

  • Fixed Effects Model using demeaned data
  • Fixed Effects Model using data that has been differenced
  • Fixed Effects Model using dummy variables
  • Randon Effects Model (REM) using FGLS regression

The following array worksheet functions have been added that are useful in performing regression on panel data.

PANEL_MEANS(R1, periods, head): returns an array of the unit means (across the time periods) of the panel data in R1.

PANEL_DEMEAN(R1, periods, head, param): returns an array of the same size and shape as R1 consisting of the demeaned version of the panel data in R1. Each element in the output consists of the corresponding data element in R1 minus the mean for that element’s unit. param takes values between 0 and 1 (default 1) as explained below

PANEL_DIFF(R1, periods, head): returns an array consisting of the first-differenced version of the panel data in R1 based on the time periods for each unit.

PANEL_DVM(R1, periods, head, ttype): returns an array consisting of the data in R1 augmented by additional columns corresponding to dummy variables for each of the units. If R1 contains k units then when ttype = 0 (default) then the dummy variables correspond to units 1 through k (without an intercept); when ttype = 1 then the dummy variables correspond to units 2 through k and when ttype = -1 the dummy variables correspond to units 1 through k-1.

PANEL_REM(R1, periods, head): returns an array consisting of the random effects model (REM) for the panel data in R1.

periods = the number of time periods in the panel data; it is assumed that the rows of panel data in R1 are sorted first by unit number and then by time period, although the time and units values are not included in R1. R1 is assumed not to have any missing time periods (balanced panels). Thus, the number of units is equal to the number of data rows divided by periods.

When head = TRUE, the first row of both R1 and the output contains column headings (default FALSE).

In order to build the REM model for PANEL_REM, the following new function is used.

REM_PARAM(Rx, Ry, periods) = the transformation parameter for the random-effects model based on the X data in Rx and Y data in Ry.

As mentioned above, PANEL_DEMEAN can take a param argument, which for our purposes will be the output from REM_PARAM. When this argument is used the data will be partially demeaned; i.e. param ⋅ unit-mean will be subtracted from each data element in R1.

Lambda-like Capabilities

Excel 365 is introducing a new LAMBDA worksheet function that allows users to include a function as a parameter in another worksheet function. It also adds the BYROW, BYCOL, MAKEARRAY, SCAN, MAP and REDUCE worksheet functions that take advantage of the new LAMBDA function. This capability is currently in beta test and will be released shortly by Microsoft for Excel 365 users. See LAMBDA Functions for more details.

This release of the Real Statistics Resource Pack provides comparable functionality for all users of Excel starting with Excel 2010 via the LAMB worksheet function. 

LAMB(expression, parameter_list): returns a string that represents the function defined by the specified expression on the specified list of parameters.

Here, expression is a text string that describes a function. This function can take a numerical value such as “=ABS(x)/y^x”, a Boolean value such as “=OR(x,y)”, a string value such as “=CONCAT(x,y)” or an array value such as “=MDETERM(x)*MMULT(x,y)”. Each of these is a function of two variables (called parameters). These parameters, “x” and “y” in the above examples, can take numeric, Boolean, string or array values. Note that in each of these expressions the “=” symbol is optional.

In each of the above examples, the parameters take the same type of value as the expression. This is not always so. E.g. the expression in “x<y” takes a Boolean value, but the parameters take numeric values.

BYCOLS Function

We now give an example of one of the new Real Statistics worksheet functions where R1 is an m × n array or cell range and expression and col are strings where expression can be viewed as a function of the string in col.

BYCOLS(R1, LAMB(expression, col)): returns a 1 × n row array each of whose elements is the value obtained by applying the function defined by expression to each of the columns in R1.

For example, the formula =BYCOLS(A1:C4, LAMB(“=SUM(col)”, “col”)) returns a 1 × 3 array whose values are the sums of the values in each of the three columns in A1:C4. This is equivalent to the Real Statistics formula SUMCOL(A1:C4).

Note that col can be replaced by any legal name in Excel using any alphanumeric characters or the underline character and the result will be the same. Thus, =BYCOLS(A1:C4, LAMB(“=SUM(x1)”,”x1″)) will return the exact same result.

The equivalent Excel 365 worksheet function is BYCOL(R1, LAMBDA(parameter, expression)). Thus, we obtain the same result for the above example by using the formula =BYCOL(A1:C4, LAMBDA(col, SUM(col))). Note that the expression must follow the parameter (or parameter list in the cases where there is more than one parameter). The expression and parameter elements are also not inserted inside quotes.

In the Real Statistics version of these worksheet functions, the LAMB function doesn’t necessarily need to be explicitly referenced. Thus, we can use the formula =BYCOLS(A1:C4, “=SUM(col)”, “col”) to achieve the same result.

Furthermore, the parameters also do not need to be explicitly stated. E.g. the worksheet function =BYCOLS(A1:C4, “=SUM($col)”) will return the same result. In this case, the parameter name must be preceded by a dollar symbol.

Other New LAMBDA-like Functions

The following new worksheet functions are now supported by Real Statistics. These are the equivalents of the new, above-referenced, Excel 365 LAMBDA functions.

  • BYCOLS(R1, expression, col)
  • BYROWS(R1, expression, col)
  • REDUCES(init, R1, expression, value, accumulator)
  • SCANS(init, R1, expression, value, accumulator)
  • MAKESARRAY(nrowncols, expression, row, col)
  • MAPS(R1, expression, element)

Note that the order of the value and accumulator parameters in REDUCES and SCANS is the reverse of these parameters in REDUCE and SCAN.

As described above, if we use parameter names that are preceded by $ then we don’t have to explicitly include the parameter names in the worksheet formula. E.g. =MAKESARRAY(2, 3, “=$i^$j”) will create a 2 × 3 array whose first row contains the values 1, 1, 1 and whose second row contains the values 2, 4, 8. Here the first parameter found in the expression, namely $i, corresponds to row and the second parameter, namely $j, corresponds to col. Note that =MAKESARRAY(2, 3, “=$j^$i”) will produce the same array, whereas =MAKESARRAY(2, 3, “=j^i”, “i”, “j”) will produce an array whose first row is 1, 2, 3 and whose second row is 1, 4, 9.

Earlier Lambda-like Functions

In Rel 7.9 and 7.10, we introduced the DERIV, DERIV2, INTEGRAL, BISECTION, BRENT, SECANT, NEWTON, NROOTS, MNEWTON, FMIN, FMAX and FUNC worksheet functions. The first argument in each of these worksheet functions, R1, is a worksheet cell that contains a formula that represents a function f(x) in one variable. The last argument in each of these worksheet functions contains an optional cell address, Rx, that represents the variable x of the function. If omitted this address defaults to the first address in the formula in R1. R1 and Rx play a similar role to the LAMBDA worksheet function (or the explicit or implicit Real Statistics LAMB worksheet function).

Similarly, the first argument, R1, in the NEWTON2, MNEWTON2, FUNC2, F2MIN, F2MAX, GRADIENT2 and HESSIAN2 worksheet functions is the address of a cell that contains a formula that represents a function f(x, y) in two variables. The last two arguments of these worksheet functions contain optional cell addresses for x and y. If omitted these addresses default to the first address and, if necessary, the second address in the formula in R1. This provides lambda functionality for functions in two variables.

Finally, NEWTON3, MNEWTON3, FUNC3, F3MIN, F3MAX, GRADIENT3 and HESSIAN3 reference a function in three variables.

Modifications to Earlier Lambda-like Functions

Note that there was a bug in these worksheet functions that occurred when there were references to cells whose addresses overlap (e.g. A1 and A10 or B7 and AB7). This bug has been resolved in Rel 8.0.

Also, to simplify things, starting with Rel 8.0, cell addresses that represent a function variable can’t contain a dollar symbol (thus A4 is acceptable, but A$4, $A4 or $A$4 are not). Any address that contains a $ is considered to be a constant (whose value is the value in that cell). 

We can use the formula =DERIV(4,A1,,A2) or =DERIV(4,A1) where cell A1 contains the formula =A2^3+EXP(A2) to return the derivative of f(x) = x3+ex at x = 4. With Rel 8.0, we can now also use the formula =DERIV(4,LAMB(“x^3+EXP(x)”, “x”)) or =DERIV(4,LAMB(“$x^3+EXP($x)”)) or even =DERIV(4,”$x^3+EXP($x)”) or =DERIV(4,”x^3+EXP(x)”,,”x”) to achieve the same result. This applies also to the other lambda-like functions that were available starting with Rel 7.9 and 7.10.

Note too that the previous style of lambda-like function can be used with the BYCOLS, BYROWS, MAKESARRAY, REDUCES, MAPS and SCANS functions. E.g. =BYCOLS(A1:C4, A1, A2), where cell A1 contains the formula =SUM(A2), produces the same value as =BYCOLS(A1:C4, “=SUM(col)”, “col”).

EVALS Function

The following new worksheet function has been added:

EVALS(expression, value_list): returns the value of the function represented by expression at the specified list of values.

E.g. =EVALS(LAMB(“x^3+EXP(y)”, “x”), 2, 3) returns the value of 2^3+EXP(3), namely 28.08554. Here, expression is a lambda expression created using LAMB. When only default parameters are used, i.e. those beginning with a $, then LAMB may be dropped. Thus, we get the same result via the formula =EVALS(“$x^3+EXP($y)”, 2, 3).

Note that EVALS can also be used with cell formulas. E.g. if cell A1 contains the formula =A2^2+EXP(A3), then the formula =EVALS(LAMB(A1,A2,A3),2,3) or =EVALS(A1,2,3) also produces the value 28.08554.

Representing an Array as a String

The following new worksheet function has been added that converts an array to a string.

Array2String(R1) = a string that represents the array R1.

For example, suppose that the first row of range A1:B3 contains the values “A”, “B” and “C” and the second row contains the values 5, 10, 20, then the output from Array2String(A1:B3) is the string “{A,B,C;5,10,20}”.

RSquare_Test Function

The existing RSquareTest(R1, R2, Ry) function returns the p-value of a test to determine whether adding one or more independent variables makes a significant difference when conducting multiple linear regression. Here Ry is the column array of Y data values, R1 is an array of X data values and R2 is an array of the same X data values omitting the data for the variables in question.

The following new array function has been added that provides additional information about this test.

RSquareTest(R1, R2, Ry, lab): returns a 6 × 1 column array with the values R1-square, R2-square, F, df1, df2 and p-value for this test. If lab = TRUE (default FALSE) then a column of labels is appended to the results.

Bug Fix

Thanks to Jamie, I have identified an error in the Std3Art function. This has now been repaired in this release.

References

Microsoft (2021) Lambda function
https://support.microsoft.com/en-us/office/lambda-function-bd212d27-1cd1-4321-a34a-ccbf254b8b67

Microsoft (2021) New lambda functions available in Excel
https://insider.office.com/en-us/blog/new-lambda-functions-available-in-excel

2 thoughts on “Real Statistics Release 8.0”

Comments are closed.