As seen in Linear Regression Models for Comparing Means, categorical variables can often be used in regression analysis by first replacing categorical variables with a dummy variable (also called a tag variable).

We now illustrate more complex examples and show how to perform One Factor and Two Factor ANOVA using multiple regression. See Three Factor ANOVA using Regression for information about how to apply these techniques to factorial ANOVA with more than two factors.

One-way ANOVA

We start with a One Factor ANOVA example.

Example 1: Repeat the analysis from Example 1 of Basic Concepts for ANOVA with the sample data on the left side of Figure 1 using multiple regression.

ANOVA using dummy variables

Figure 1 – Using dummy variables for One-way ANOVA

Our objective is to determine whether there is a significant difference between the three flavorings. In this example, we have reduced the sample size from Example 1 of Basic Concepts for ANOVA to better illustrate the key concepts. Instead of doing the analysis using ANOVA as we did there, this time we will use regression analysis instead. First, we define the following two dummy variables and map the original data into the model on the right side of Figure 1.

t1 = 1 if flavoring 1 is used; = 0 otherwise
t2 = 1 if flavoring 2 is used; = 0 otherwise

Note that in general, if the original data has k values the model will require k – 1 dummy variables.

The null hypothesis is

H0: µ1 = µ2 = µ3

where xj = the score for Flavor group j. The linear regression model is


Meaning of dummy codes

Note that


since for the Flavor 1 group, t1 = 1 and t2 = 0.


since for the Flavor 2 group, t1 = 0 and t2 = 1


since for the Flavor 3 group, t1 = 0 and t2 = 0

Thus the null hypothesis given above is equivalent to


Simplifying, this means that the null hypothesis is equivalent to:

H0:  β1 = β2  = 0

Regression Analysis

The results of the regression analysis are displayed in Figure 2.

ANOVA using regression

Figure 2 – Regression analysis for data in Example 1

We now compare the regression results from Figure 2 with the ANOVA on the same data found in Figure 3. Note that the F value 0.66316 is the same as that in the regression analysis. Similarly, the p-value .52969 is the same in both models.

ANOVA data analysis Excel

Figure 3 – ANOVA for data in Example 1

Note the following about the regression coefficients:

  • The intercept b0 = mean of the Flavor 3 group = 14.
  • Coefficient b1 for variable t1 = mean of the Flavor 1 group – mean of the Flavor 3 group = 12 – 14 = -2
  • Coefficient b2 for variable t2 = mean of the Flavor 2 group – mean of the Flavor 3 group = 11.5 – 14 = -2.5

This is consistent with what we noted above when relating the population group means to the population coefficients, namely µ3 = β0µ1 = β0 + β1, and µ2 = β0 + β2.

Alternative Coding

Example 1 (alternative approach): An alternative coding for Example 1 is as follows

t1 = 1 if flavoring 1 is used; = -1 if flavoring 3 is used; = 0 otherwise
t2 = 1 if flavoring 2 is used; = -1 if flavoring 3 is used; = 0 otherwise

In general, If there are k groups then the jth dummy variable tj = 1 if the jth group, tk = -1 if the kth group, and = 0 otherwise.

The data now can be expressed as in the table on the left of Figure 4.

ANOVA using alternative coding

Figure 4 – Alternative coding for data in Example 1

The null hypothesis and linear regression model are as before. Now we have:


since for the Flavor 1 group, t1 = 1 and t2 = 0


since for the Flavor 2 group, t1 = 0 and t2 = 1


since for the Flavor 3 group, t1 = -1 and t2 = -1

Thus the null hypothesis is equivalent to β0 + β1 = β0 + β2 = β0 – (β1 + β2). Simplifying, this means once again that the null hypothesis is equivalent to:

H0: β1 = β2 = 0

Note too that μ2 = β0 – (β1 + β2) = β0 – (μ1 – β0 + μ2 – β0), and so β0 = (μ1 + μ2 + μ3)/3, i.e. β0 = the population grand mean. Also β1 = μ1 – β0 and β2 = μ2 – β0, and so β1 = the population Flavor 1 mean less the population grand mean and β2 = the population Flavor 2 mean less the population grand mean.

The results of the regression analysis are given on the right side of Figure 4.


The first Summary and ANOVA tables are identical to the results from the previous analysis, and so once again we see that the results are the same as for the ANOVA. The regression coefficients, however, are different.

Figure 5 displays the grand mean, the group means, and the group effect sizes (i.e. the group mean less the grand mean).

Group mean / effect size

Figure 5 – Group means and group effect sizes

We note that the intercept of the regression model is the grand mean 12.5 and the other coefficients correspond to the group effects for the Flavor 1 and Flavor 2 groups.

Two Factor ANOVA

We now show how to use regression to perform two factor ANOVA.

Example 2: Repeat the analysis from Example 1 of Two Factor ANOVA with Replication on the reduced sample data in the table on the left of Figure 6 using multiple regression.

Two Factor ANOVA example

Figure 6 – Data for Example 2

This time we show how to perform two-factor ANOVA using multiple regression. As we did in the previous example, we first define the dummy variables as follows:

t1 = 1 if Blend X; = 0 otherwise
t2 = 1 if Corn; = 0 otherwise
t3 = 1 if Soy; = 0 otherwise

The data now takes the form shown in Figure 7 where y is the yield.

Two-way ANOVA coding

Figure 7 – Coded data for Example 2

Note that this time we model the interaction of t1 with t2 and t3, as described in Interaction. The regression model that we use is of form


Group Means

We now build a table of the means for each of the 6 groups (i.e. cells), as described in Figure 8.

Group means two-way ANOVA

Figure 8 – Group means for Example 2

The data in Figure 8 can be constructed by calculating the means of each of the above 6 groups from the original data or by applying the AVERAGEIFS function to the transformed data.

As we did in Example 1, we note that the mean for Blend Y and Rice (i.e. where t1 = t2 = t3 = 0) is given by


and similarly for the other combinations:






Solving the simultaneous equations, we get the following values for the coefficients:

b0 = 165.4          b1 = -24.2       b2 = -5.8          b3 = -25.2          b4 = 0            b5 = 59.8

Regression Analysis (two-way ANOVA)

We get the same results when we run the Regression data analysis tool (see Figure 9).

ANOVA using regression

Figure 9 – Regression for data in Example 2

The relatively high value of R and low value of Significance F show that the above model is a sufficiently good fit. Using the ANOVA: Two factor data analysis tool, we get the output shown in Figure 10.

Two factor ANOVA Excel

Figure 10 – Two-factor ANOVA for the data in Example 2

Obtaining ANOVA results

We now show how to obtain the ANOVA results from the Regression model and vice versa. Note that MSW = 450.33 = MSRes, which is as expected since both of these denote the portion of the variation due to error. Also note that MST = 17457.87/29 = 602.00 for both models, and so the systemic variation for both models is the same as well. For the ANOVA model this is


= (136.53 + 553.27 + 5960.07) / (1 + 2 + 2) = 6649.87/5 = 1329.97

This is the same as MSReg = 6649.87/5 = 1329.97 for the Regression model.

To obtain the Rows (A), Columns (B), and Interaction (AB) values in the ANOVA model from the Regression model, first rerun the regression analysis using only t1 as an independent variable. The values obtained for SSReg, dfReg and MSReg are the values of SSRow, dfRow and MSRow in the ANOVA model. Then rerun the regression analysis using only t2 and  t3. The values obtained for SSReg, dfReg and MSReg are the values of SSCol, dfCol and MSCol in the ANOVA model. Now SSInteraction = SSBet SSRow SSCol (and similarly for the df terms) where SSBet is SSReg in the original (complete) regression model.

Finally, note that the value of R Square = .381. This has two interpretations. First, it is the square of Multiple R (whose value = .617), which is simply the correlation coefficient r. Second, it measures the percentage of variation explained by the regression model (or by the ANOVA model), which is

SSReg/SST = 6649.87/5793 = 0.381

which is also equal to 1 – SSW/SST from the ANOVA model.

Alternative Coding

Just as we did in the single factor ANOVA of Example 1, we can obtain similar results for Example 2 using the alternative coding of dummy variables, namely

t1 = 1 if Blend X; = -1 otherwise
t2 = 1 if Corn; -1 if Rice; = 0 otherwise
t3 = 1 if Soy; = -1 if Rice; = 0 otherwise

This approach is especially useful in creating unbalanced ANOVA models, i.e. where the sample sizes are not equal in a factorial ANOVA (see Unbalanced Factorial Anova).

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack supplies the following array worksheet function.

SSAnova2(R1, r) – returns a column array with SSRow, SSCol, SSInt and SSW for Two Factor ANOVA for the data in R1 using a regression model; if r > 0 then R1 is assumed to be in Excel Anova format (with row/column headings) with r rows per sample, while if r = 0 or is omitted then R1 is assumed to be in standard format (without column headings).

ANOVA Residuals

Now that we have explained how ANOVA is really regression using categorical variables, we can define ANOVA residuals. Click here for more information about ANOVA residuals.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.


