Normality Testing of Factorial ANOVA Residuals

Premise

As we describe in Normality Testing of ANOVA Residuals, ANOVA is actually a form of regression. As a result, we can speak about residuals, i.e. the difference between actual data and the values predicted by the regression model. For our purposes, we don’t need to go into the details here. For those who are interested, see ANOVA using Regression.

What is important is to recognize that the normality assumption for Factorial ANOVA is that these residuals be normally distributed. As a result, it is important to understand how to identify these residuals. Once this is done, we can apply any of the various tests for normality, especially  QQ-plots, Shapiro-Wilk test, d’Agostino-Pearson test, etc.

Residuals for Two Factor ANOVA

For two-way ANOVA, the residuals turn out to be the difference between the actual data elements xijk and the interaction means mij, i.e. xijk – mij.

Two Factor ANOVA Example

Example 1: Determine the residuals for the data on the left side of Figure 1. Then determine whether the normality assumption for two-way ANOVA is met.

Testing residuals for normality

Figure 1 – Normality for two-way ANOVA

For this example, there are two Row factor levels and three Column factor levels for 2 × 3 = 6 interactions. The mean for each of these interactions is shown in range A14:D16.

It is easier to obtain the residuals if the data is in stacked format. This is obtained by using the formula =Anova2Std(A4:D12,4), as shown in range F3:H26. We now obtain the residuals (displayed in column I) by subtracting the appropriate element in A14:D16 from the values in column H. More precisely, we insert the formula

=H3-INDEX(B$15:D$16,MATCH(F3,A$15:A$16,0),MATCH(G3,B$14:D$14,0))

in cell I3, highlight range I3:I26, and press Ctrl-D.

The Shapiro-Wilk test result of p-value = .60768 on range I3:I26 shows that the residuals are normally distributed.

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following worksheet function.

Anova2Res(R1): returns a column array with the residuals for two-way ANOVA on the data in R1 in standard (stacked) format (i.e. three column array with row labels in the first column, column labels in the second column and numeric values in the third column).

We obtain the same results shown in column I of Figure 1 by using the formula =Anova2Res(F3:H26) or equivalently the array formula =Anova2Res(Anova2Std(A4:D12,4)).

Residuals for Three Factor ANOVA

For three-way ANOVA, the residuals again turn out to be the difference between the actual data elements and the interaction means.

Three Factor ANOVA Example

Example 2: Determine the residuals for the data in range A3:F15 of Figure 2 and determine whether the normality assumption for three-way ANOVA is met.

Normality testing of residuals

Figure 2 – Normality for three-way ANOVA (part 1)

As for two-way ANOVA, it is easier to find the residuals if the data is in stacked format by columns instead of by rows. We therefore use the formula =Anova3Cols(A4:F15) to reformat the data as shown in range H4:K39 (although rows 26 through 39 are not displayed).

Since each of the factors has 2 levels, there are a total of 23 = 8 combinations, as shown in range N4:P11 of Figure 3.

Means of the interactions

Figure 3 – Interaction means

We now find the mean for each of these 8 interactions by placing the formula

=AVERAGEIFS(K$4:K$39,H$4:H$39,N4,I$4:I$39,O4,J$4:J$39,P4)

in cell Q4, highlighting range Q4:Q11, and pressing Ctrl-D.

We now obtain the residuals by inserting the formula

=K4-SUMIFS(Q$4:Q$11,N$4:N$11,H4,O$4:O$11,I4,P$4:P$11,J4)

in cell L4 (of Figure 2), highlighting the range L4:L39, and pressing Ctrl-D.

Finally, we use the Shapiro-Wilk and d’Agostino-Pearson tests, as shown in Figure 4, to confirm that the residuals are normally distributed.

Normality testing of residuals

Figure 4 – Normality for three-way ANOVA (part 2)

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following worksheet function.

Anova3Res(R1): returns a column array with the residuals for three-way ANOVA on the data in R1 in standard (stacked) format (i.e. four-column array with labels in the first, second, and third columns, and numeric values in the fourth column).

We obtain the same results in range L4:L39 of Figure 4 by using the array formula =Anova3Res(H4:K39) or equivalently the array formula =Anova3Res(Anova3Cols(A4:F15)).

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

GraphPad (2024) Residuals tab: One-way ANOVA
https://www.graphpad.com/guides/prism/latest/statistics/stat_1wayanova_residuals-tab.htm

Minitab (2024) Does the response need to follow a normal distribution?
https://support.minitab.com/en-us/minitab/help-and-how-to/statistical-modeling/anova/supporting-topics/anova-models/does-the-response-need-to-be-normal/#:~:text=ANOVA%20assumes%20that%20the%20residuals,assumptions%20of%20ANOVA%20are%20met.

Leave a Comment