Planned Comparisons

Introduction

Suppose we conduct an ANOVA and reject the null hypothesis that all the sample means are equal. Since there are significant differences among the means, it would be useful to find out where the differences are. To accomplish this we perform extended versions of the two-sample comparison t-tests described in Two Sample t-Test with Equal Variances.

In fact, some bypass the omnibus ANOVA altogether and jump immediately to the comparison tests. Using this approach, as we will see, only the value of MSE from ANOVA is required for the analysis.

For example, suppose you want to investigate whether a college education improves earning power, considering the following five groups of students:

  • High School Education
  • College Education: Biology majors, Physics majors, English majors, History majors

You select 30 students from each group at random and find out their salaries 5 years after graduation. The omnibus ANOVA shows there are differences between the mean salaries of the four groups. You would now like to pinpoint where the differences lie. For example, you could ask the following questions:

  1. Do college-educated people earn more than those with just a high school education?
  2. Do science majors earn more than humanities majors?
  3. Is there a significant difference between the earnings of physics and biology majors?

Null hypotheses

The null hypothesis (two-tail) for each of these questions is as follows:

  1. \mu_{HS} = \frac{\mu_{Bio} + \mu_{Phy} + \mu_{Eng} + \mu_{His}}{4}
  2. \frac{\mu_{Bio} + \mu_{Phy}}{2} = \frac{\mu_{Eng} + \mu_{His}}{2}
  3. \mu_{Phy} = \mu_{Bio}

These tests are done employing something called contrasts.

Contrasts

Definition 1: Contrasts are simply weights ci such that

image1126

The idea is to turn a test such as the ones listed above into a weighted sum of means

image1127

using the appropriate values of the contrasts as weights. For example, for the first question listed above, we use the contrasts

image5030

and redefine the null hypothesis as

image1133

We then use a t-test as described in the following examples. Note that we could have used the following contrasts instead:

image5031

The results of the analysis will be the same. The important thing is that the sum of the contrasts adds up to 0 and that the contrasts reflect the problem being addressed.

For the second question, by using the contrasts

image5032

the null hypothesis once again can be expressed in the form:

image1133

The third question uses contrasts

image5033

Example

Example 1: Compare methods 1 and 2 from Example 3 of Basic Concepts for ANOVA.

We set the null hypothesis to be:

H0 : µ1 – µ2 = 0, i.e. neither method (Method 1 or 2) is significantly better

When the null hypothesis is true, this is the two-sample case investigated in Two Sample t-Test with Equal Variances where the population variances are unknown but equal. As before, we use the following t-test

image1134

which has distribution T(n1 + n2 – 2).

But as we have seen, s2 = MSW and dfW = n1 + n2 – 2, and so when the null hypothesis is true,

image1138

which has distribution T(dfW). For Example 1, we have the following results:

Pairwise comparison test Excel

Figure 1 – Comparison test of Example 1

Since p-value = .02373 < .05 = α, we reject the null hypothesis and conclude that there is a significant difference between methods 1 and 2.

t-statistic for contrasts

In fact, there is a generalization of this approach, namely the use of the statistic

Comparison t statistic

where the cj are constants such that c1 + c2 + c3 + c4 = 0. As before t ~ T(dfW). Here the denominator is the standard error. We now summarize this result in the following property.

Property

Property 1: For contrasts cj, the statistic t has distribution T(dfW) where

image1146

Since by Property 1 of F Distribution, t ~ T(df) is equivalent to t2 ~ F(1, df), it follows that Property 1 is equivalent to

image1149

Example 1 (continued): We can use Property 1 with c1 = 1, c2 = -1 and c3 = c4 = 0 as follows:

Planned comparison ANOVA Excel

Figure 2 – Use of contrasts for Example 1

Here the standard error (in cell N14) is expressed by the formula =SQRT(I15*R11) and the t-stat (in cell O14) is expressed by the formula =S11/N14. As before p-value = .0237 < .05 = α.

Confidence Intervals

The t-test can also be used to create a confidence interval for a contrast, exactly as was done in Confidence Interval for ANOVA.

Example 2: Compare method 4 with the average of methods 1 and 2 from Example 3 of Basic Concepts of ANOVA.

We test the following null hypothesis

H0: µ1 = (µ2 + µ3) / 2

using Property 1 with c1 = 1, c2 = c4 = -.5 and c3 = 0.

Contrasts one-way ANOVA

Figure 3 – Use of contrasts for Example 2

Since p-value = .9689 > .05 = α, we can’t reject the null hypothesis, and so conclude there is no significant difference between method 4 and the average of methods 1 and 2.

One-tailed test

The above analysis employs a two-tail test. We could also have conducted a one-tail test using the null hypothesis H0: µ1 ≤ (µ2 + µ3) / 2. In that case, we would use the one-tail version of T.DIST in calculating the t-stat.

Orthogonal Contrasts

Contrasts are essentially vectors, and so we can speak of two contrasts being orthogonal. Per Definition 8 of Matrix Operations, assuming that the group sample sizes are equal (i.e. ni = nj for all i, j), contrasts (c1,…,ck) and (d1,…,dk) are orthogonal provided

image1160

Geometrically this means the contrasts are at right angles to each other in space.

Assuming there are k groups, if you have – 1 contrasts C1, …, Ck-1 that are pairwise orthogonal, then any other contrast can be expressed as a linear combination of these contrasts. Thus you only ever need to look at k – 1 orthogonal contrasts. Since

image1164for any contrast C = (c1,…,ck), each of the Cj is orthogonal to the unit vector (1,…,1) and so k – 1 contrasts (and not k) are sufficient.

Note that if C1, …, Ck-1 are pairwise orthogonal, then SS between groups,

image1169

Also each
image1170

and so
image1171

Relationship to omnibus ANOVA

Thus any k – 1 pairwise orthogonal contrasts partition SSB.

Thus, if none of the t-tests for a set of k–1 pairwise orthogonal contrasts are significant, then the ANOVA F-test will also not be significant. Consequently, if the omnibus ANOVA F-test is significant, then at least one of k–1 pairwise orthogonal contrasts will be significant.

A non-significant F-test does not imply that all possible contrasts are non-significant. Also, a significant contrast doesn’t imply that the F-test will be significant.

In general, to reduce experiment-wise error you should make the minimum number of meaningful tests, preferring orthogonal contrasts to non-orthogonal contrasts. The key point is to make only meaningful tests.

Unbalanced samples

When the group sample sizes are not equal (i.e. unbalanced group samples), we need to modify the definition of orthogonal contrasts. In fact, contrasts (c1,…,ck) and (d1,…,dk) are orthogonal provided

Contrasts unbalanced

The assumptions for contrasts are the same as those for ANOVA, namely

  • Independent samples
  • Within each group, participants are independent and randomly selected
  • Each group has the same population variance
  • Each group is drawn from a normal population

The same tests that are employed to test the assumptions of ANOVA (e.g. QQ plots, Levene’s test, etc.) can be used for contrasts. Similarly, the same corrections (e.g. transformations) can be used for contrasts. In addition, two other approaches can be used with contrasts, namely Welch’s correction when the variances are unequal or a rank test (e.g. Mann-Whitney U test or ANOVA on ranked data) when normality is violated. Keep in mind that ranks are only ordinal data, and so linear combinations (including averages) cannot be used, only comparisons of type µ1 = µ2. Also, any conclusions drawn from ANOVA on ranked data apply to the ranks and not the observed data.

Dealing with Experiment-wise error

As described in Experiment-wise Error Rate, in order to address the inflated experiment-wise error rate, either the Dunn/Sidák or Bonferroni correction factor can be used.

Dunn/Sidák correction was described in Experiment-wise Error Rate. To test k orthogonal contrasts in order to achieve an experiment-wise error rate of αexp, the error rate α of each contrast test must be such that 1 – (1 – α)k= αexp. Thus α = 1 – (1 – αexp)1/k. E.g. if k = 4 then to achieve an experiment-wise error rate of .05, α = 1 – ∜.95 = 0.012741.

Bonferroni correction simply divides the experimental-wise error rate by the number of orthogonal contrasts. E.g. for 4 orthogonal contrasts, to achieve an experiment-wise error rate of .05, simply set α = .05/4 = .0125. Note that the Bonferroni correction is a little more conservative than the Dunn/Sidák correction, since αexp/ k < 1 – (1 – αexp)1/k.

In the above calculations, we have assumed that the contrasts have equal values for α. This is not strictly necessary. E.g. in the example above, for the Bonferroni correction, we can use .01 for the first three contrasts and .02 for the fourth. The important thing is that the sum be .05 and that the split be determined prior to seeing the data.

If the contrasts are not orthogonal then the above correction factors are too conservative, i.e. they over-correct.

Example

Example 3: A drug is being tested for its effect on prolonging the life of people with cancer. Based on the data on the left side of Figure 3, determine whether there are significant differences in the 4 groups, and check (1) whether there is a difference in life expectancy between the people taking the drug and those taking a placebo, (2) whether there is a difference in the effectiveness of the drug between men and women and (3) whether there is a difference in life expectancy for people with this type of cancer for men versus women.

ANOVA-Excel

Figure 3 – Data and ANOVA output for Example 4

The ANOVA output in Figure 3 shows (p-value = .00728 < .05 = α) there is a significant difference between the 4 groups. We now address the other questions to try to pinpoint where the differences lie. First, we investigate whether the drug provides any significant advantage.

Contrasts plus experimentwise correction

Figure 4 – Contrast test for effectiveness of drug in Example 3

Figure 4 shows the result with uncontrolled type I error and then the results using the Bonferroni and Dunn/Sidák corrections. The tests are all significant, i.e. there is a significant difference between the population means of those taking the drug from those in the control group taking the placebo.

We next test whether there is a difference in the effectiveness of the drug between men and women.

Contrasts non-pairwise comparison

Figure 5 – Contrast test for men vs. women

Since p-value = .0547 > α in all the tests, we conclude there is no significant difference between the longevity of men versus women taking the drug.

The final test is to determine if men and women with this type of cancer have a different life expectancy (whether or not they take the drug).

Contrasts life expectancy gender

Figure 6 – Contrast of life expectancy of men vs. women

The result is significant (p-value = .0421 < .05 = α) if we don’t control for type I error, but the result is not significant if we use the Bonferroni or Dunn/Sidák correction (p-value = .0421 > .0167 = α).

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following function:

DunnSidak(αexp, k) = α = 1 – (1 – αexp)1/k

Data Analysis Tool

Real Statistics Data Analysis Tool: The Real Statistics’ Single Factor Anova and Follow-up Tests data analysis tool provides support for performing contrast tests. The use of this tool is described in Example 4.

Example 4: Repeat the analysis from Example 2 using the Contrasts option of the Single Factor Anova and Follow-up Testssupplemental data analysis tool.

Enter Ctrl-m and select Single Factor Anova and Follow-up Tests from the menu. A dialog box will appear as in Figure 7.

ANOVA dialog box contrasts

Figure 7 – Dialog box for Single Factor Anova and Follow-up Tests

Enter the sample range in Input Range, click on Column headings included with data, and check the Contrasts option. Select the type of alpha correction that you want, namely no experiment-wise correction, the Bonferroni correction, or the Dunn/Sidák correction (as explained in Experiment-wise Error Rate). In any case, you set the alpha value to be the experiment-wise value (defaulting as usual to .05).

Output

Note too that the contrast output that results from the tool will not contain any contrasts. You need to fill in the desired contrasts directly in the output (e.g. for Example 4 you need to fill in the range O32:O35 in Figure 7 with the contrasts you desire).

When you click on OK, the output from this tool is displayed (as in Figure 8). The fields relating to effect size are explained in Effect Size for ANOVA.

Contrasts data analysis tool

Figure 8 – Real Statistics Contrast data analysis

Caution

If your Windows settings for the decimal separator is comma (,) and the thousands separator is period (.) then you may get incorrect values for alpha when using the Bonferroni or Sidak/Dunn corrections.

To avoid this problem, you need to configure Real Statistics to use the correct settings, as described at Using Comma as the Decimal Symbol.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Wikipedia (2012) Bonferroni correction
https://en.wikipedia.org/wiki/Bonferroni_correction

Abdi, H. (2007) The Bonferonni and Šidák corrections for multiple comparisons
https://personal.utdallas.edu/~herve/Abdi-Bonferroni2007-pretty.pdf

35 thoughts on “Planned Comparisons”

  1. Charles,
    In Ex. 1, you have substituted the standard deviation with MSw and df of the t test with n1+n2-2. In this case, MSw should be the MS based on only Method 1 and Method 2, rather than all 4 methods, shouldn’t it?

    Based on the Figure 1, it seems that you have used MSw based on all 4 methods and df of 25, which is from all 4 methods. However, we are comparing only Method 1 and Method 2 means. Therefore, the df should be 13 instead of 25.

    I compared the results using the two sample equal variance t-test to the results using MSw using 4 methods and df of 25, they did not match. When I used MSw using only 2 methods (ie, Method 1 and Method 2) and df=13, the results match with the 2 sample equal variance t-test results.

    Please advise.

    Reply
    • Sun Kim,
      It is not surprising that the results will be different. If you are only going to do one planned follow-up pairwise test (i.e. you know in advance of seeing the data which pairwise test you will do and you will only do one such test), then you should just perform the usual t test. In other cases, you should do one of the planned or unplanned post-hoc tests.
      Charles

      Reply
    • Hello Charles,
      Can you please clarify the use of Bonferroni’s correction in the following ANOVA & planned comparisons scenario. I have a data set with 3 experimental groups and 3 dependent variables. I would like to perform 3 separate one-way ANOVAs. (I can’t perform MANOVA because Box’s M test shows inhomogeneity of covariance matrices). I assume that I should divide the required alpha for the main effect of each of the 3 ANOVAs by 3, and require p< 0.167, for each main effect, correct? Is this still the case if I also wish to perform (planned) paired comparisons (by Tukey's HSD test) on two of the 3 ANOVAs that showed significant main effects? Since there are 3 groups, each ANOVA will require 3 paired comparisons, for a total of 6 paired comparisons. Does this change the Bonferroni’s correction applied for the main effect of each ANOVA, , i.e. should I still require p < 0.05/ 3= 0.167 for the main effect of each ANOVA, or should I require p< 0.05/9 because a total of 9 tests will be performed, 3 for main effects and 6 for Planned comparisons? And for the paired comparisons on each ANOVA, should I use p < 0.05/3, or p < 0.05/6 because I am doing 6 paired comparisons, or p < 0.05/9 because I am doing a total of 9 tests (3 for main main effects and 6 for paired comparisons)?
      Thanks very much.
      Joel

      Reply
      • Joel,
        I believe that your reasoning is correct. If you plan to perform Tukey’s HSD for each of 3 ANOVAs, then you would use a corrected alpha of .05/3. If instead you planned to perform Tukey’s HSD on ANOVAs A and B and 3 paired comparisons on ANOVA C, then you would use .05/5 (one for A and B and 3 for C). Etc.
        The ANOVAs themselves are typically not counted.
        Charles

        Reply
        • Charles,
          Sorry but I am still confused about whether to apply the Bonferroni to both the ANOVAs & the paired comparisons or only to the paired comparisons. I think I need to frame my question more precisely so that I understand your reply.
          Original plan was to perform 3 separate ANOVAs, one for each of the dependent variables A, B, and C, followed by paired comparisons on only the ANOVAs that showed main effects (Only A and B showed main effects, both at p<0.001. For C, p= 0.09). When you said “The ANOVAs themselves are typically not counted” did you mean that p<0.05 (I.e. no Bonferroni correction required) was appropriate for all 3 ANOVAs?
          And for the paired comparisons, of which there were 6 (3 for A and 3 for B), what is the appropriate Bonferroni correction for these? Should I use p < 0.05/6, or 0.05/3? Doesn’t the Tukey HSD test have its own built-in correction when using the Real Statistics program?
          Sorry to be so verbose but I was having trouble understanding your reply.
          Thanks very much.

          Reply
    • Hi Charles,
      I would like to perform power and sample size calculations for a one-way ANOVA with three levels of the independent variable, A, B, & C. I know how to perform power and sample size calculations for one way ANOVA. Does Real Statistics support sample size and power calculations for planned comparisons using Tukey HSD following ANOVA)?
      Thanks very much.
      Joel

      Reply
  2. I have 16 model permutations that I want to compare with my control group using ANOVA with contrasts and Tukey HSD follow-up and the Bonferroni alpha correction. Both follow up methods identify 4 model permutations that have significant results. I’d like to narrow the results even further by conducting another iteration of the ANOVA and followup tests to see if the 4 models results remain significant. What are the potential implications of running successive iterations of these tests?

    Reply
    • Scott,
      The situation you are describing is not completely clear to me. Are you saying that you have 17 “treatments” including a control group? By significant result, do you mean that 4 treatments are significantly different from the control group? It is not clear to me why you would perform both contrasts (presumably with a Bonferroni correction) and Tukey HSD.
      Charles

      Reply
  3. Dear Charles,

    I want to test differences between four groups of kids:
    group A doesn’t practice any sport (control group)
    groups B,C,D do practice different sports

    so, by assuming one-way ANOVA results significant, I want to test if there are differences between control group and sports group (A vs B+C+D) and also between each pair of sports group (that is, B vs C, B vs D, C vs D).
    These contrasts are not orthogonal because the test is unbalanced (different group sizes).
    I think that the Real Statistics contrast analysis is the right approach to be used in my case, but which is the right correction factor I should use, in order to keep the experiment-wise error rate to alpha=0.05?

    And conversely, if I wish to test all possible pairs of means, what could be the better approach, being the sample unbalanced?

    Thank you very much for your help
    Best Regards
    Piero

    Reply
  4. Hello Charles,

    I installed the program and ran a one-way ANOVA for some data with three groups. The ANOVA was significant, and now I would like to know which group is different. I’m trying to run some multiple comparison test, like Games-Howell, but it’s not working. I’m using the Excel in a Macbook, and actually, in my options when I open the program I do not have “Single Factor Anova and Follow-up Tests “, just “Analysis of Variance”. What can be happening? Thank you, Bárbara!

    Reply
  5. if its not too much to ask, could you send me an email that i could send a more exemplified question? just need to know i am doing the right thing. thanks for your response

    Reply
    • Terry,
      E.g. you would fill in the contrasts in the range O7:O10 in Figure 2. Initially all the cells in this range are blank and you need to fill in contrast values that sum to 0 such that the positive values add up to 1 and the negative values add up to -1. E.g. 1, 0, -1. 0 or -.5, +.5, +.5, -.5.
      Charles

      Reply
  6. Hi Charles,

    When I try Dunn/Sidak correction and Bonferroni Correction, I get an error:

    ‘A run time error has occurred. The analysis tool will be aborted. Type Mismatch.’

    If I do contrasts with no correction, then it’s fine. but not the other 2 options.

    Declan

    Reply
    • Declan,
      Thanks for reminding me about this error. It is related to the fact that some users employ a period as the decimal point and other use a comma. I have now resolved this problem, and it should be fixed in the next release of the software, which will be available later this week.
      Charles

      Reply
  7. Good day Charles 🙂

    I often think that a t test with equal variances is more like an academic exercise. For example, nobody knows what the population mean or population standard deviation is, yet there are equations for it. If you don’t know it, use the sample standard deviation/mean etc. Based on this assumption, I’ve been applying the same logic to t tests- as it’s unlikely that the samples with have the same variances.

    Interesting comment though that you made about the variance of one sample needs to be four times the variance of the other sample. Didn’t know that. Thanks. 🙂 In R, I think I’ve been doing the same thing VAR.EQUAL=FALSE. 🙂 But I will now have to be more careful.

    Merry Christmas to you.
    Declan

    Reply
    • Declan,
      Keep in mind that “variance of one sample needs to be at most four times the variance of the other sample” is a rule-of-thumb”. To be on the safe side you can always use the test with unequal variances. When the variances are equal or fairly equal the results will be almost the same as the t test with equal variances.
      Charles

      Reply
  8. Hi,
    Could please tell me, where we use t-test for equal variance or Unequal variance.
    I am bit confused.

    Thanks in Advance.

    Reply
    • Rajeev,

      When the variances of the two samples are relatively equal then the results of the two tests are approximately the same. It turns out that the t-test with equal variances gives a good result even when the variances are pretty unequal, but when the variance of one of the samples is quite different (approximately when the variance of one sample is more than four times the variance of the other sample), you should use the t-test with unequal variances instead of the t-test with equal variances.

      Since the t-test with unequal variances can always be used (at least where the other assumptions for the t-test are satisfied), when in doubt just use this test.

      Charles

      Reply
  9. Sir,
    Could you please give me some guidance about how to do the planned contrast for the categorical dependent variable?

    Thank you!

    Reply
      • In your examples, the dependent variables are continuous. But the dependent variable in my experiment is categorical (buy or not) with 2 by 2 design, I think the data can not analyzed by t-test. Do you have some experience of the planned contrast in this case?
        Thank you so much!

        Reply
        • Since your dependent variable is binary categorical, which test are you using (ANOVA would not be a good choice)?
          Charles

          Reply
          • Thanks a lot Charles!
            I run logistic regression for the interaction effect. But I don’t know how to do the planned contrast next.

Leave a Comment