Experiment-wise error rate

We could have conducted the analysis for Example 1 of Basic Concepts for ANOVA by conducting multiple two sample tests. E.g. to decide whether or not to reject the following null hypothesis

H0μ1 = μ2 = μ3

We can use the following three separate null hypotheses:

  • H0μ1 = μ2
  • H0μ2 = μ3
  • H0μ1 = μ3

If any of these null hypotheses is rejected then the original null hypothesis is rejected.

Real alpha value

Note however that if you set α = .05 for each of the three sub-analyses then, assuming independence, the overall alpha value is .14 since 1 – (1 – α)= 1 – (1 – .05)3 = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting the null hypothesis even when it is true (type I error) is 14.2525%.

For k groups, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)m, a value which would get progressively higher as the number of samples increases. For example, if k = 6, then m = 15 and the probability of finding at least one significant t-test, purely by chance, even when the null hypothesis is true is over 50%.

In fact, one of the reasons for performing ANOVA instead of separate t-tests is to reduce the type I error. The only problem is that once you have performed ANOVA if the null hypothesis is rejected you will naturally want to determine which groups have unequal variance, and so you will need to confront this issue in any case.

With 3 separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952. As is mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests. If the experiment-wise error rate < .05 then the error rate is called conservative. If it is > .05 then the error rate is called liberal.

Planned vs. unplanned post-hoc tests

There are two types of follow up tests following ANOVA: planned (aka a priori) and unplanned (aka post hoc or posteriori) tests. Planned tests are determined prior to the collection of data, while unplanned tests are made after data is collected. These tests have entirely different type I error rates.

For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis \frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2} then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test \frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2} because μ1 and μ2are the smallest means and μ3and μ4 are the largest.

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Wikipedia (2012) Family-wise error rate
https://en.wikipedia.org/wiki/Family-wise_error_rate

Leday, G., Hemerik, J., Engel, J., van der Voet, H. (2023) Improved family-wise error rate control in multiple equivalence testing
https://www.sciencedirect.com/science/article/pii/S0278691523003307

34 thoughts on “Experiment-wise error rate”

  1. Congratulations for the explanation on this page!
    What do you think about why there are still researchers who insist on using tests that only control for type I error rates by comparison?
    thanks

    Reply
  2. Am glad to have come across ds site… Sir kindly assist me on how to determine comparison wise error rate when i have five treatments and the number of comparison is ten, my level of significance is 0.05 and 4 of the comparison are below the significant level, how do i determine the error rate.

    Reply
    • Bukar,
      If you run 10 separate comparison tests, then experiment-wise error rate is 1-(1-.05)^10. This is based on the assumption that all the comparisons are independent of each other.
      Charles

      Reply
  3. Hello Sir,
    I would like to ask you a question: I have three samples and four tests I plan to conduct. I have decided to apply Bonferroni correction with a new alpha as 0.05/4. Suppose I find two statistical significances and I plan to make pairwise comparisons for these two tests, should I have to apply another Bonferroni correction? Would it be 0.05/(3*3)?

    Reply
    • Carla,
      Before looking at the data or the results of the tests, do you plan to conduct a total of 4 tests? In that case, the corrected alpha is .05/4. If instead, you plan to run 4 tests for each of the 3 samples, then the corrected alpha is .05/12 (unless what you plan to do for each sample is independent of what you do for the other two samples). You need to use a divisor based on the maximum number of tests; you can’t into account information that you have obtained after running some of the tests (such as seeing which of these tests are significant). This is the general situation. I would need more details about what you are trying to do to get a more definitive response.
      The corrected alpha can get quite small. This is why there are a number of special post-hoc tests (such as Tukey HSD after ANOVA) that address experiment-wise error better than Bonferroni. But these post-hoc tests depend on the details of the analysis that you are trying to perform.
      Charles
      Charles

      Reply
      • Thank you for your answer. Let me explain.
        I have three groups (levels of the independent variable) that I want to test on 4 different categorical variables. I plan to use the Chi square test of independence, so I thought to use Bonferroni and set alpha as 0.05/4. If, after using the correction, I get a significant result from at least one test I plan to conduct a post-hoc analysis by using again Chi square and doing a pairwise comparison of the three groups (1 vs 2, 1 vs 3, 2 vs 3). In this case should I have to make another correction considering the total number of possible comparisons? For example, if after these 4 Chi square tests I have just one statistically significant result, should I set my new alpha at 0.05/3 and do these comparisons? What if I have two significant tests? Then would alpha be 0.05/9? I apologise for these silly questions.

        Reply
  4. Dear Dr. Charles

    I have an experiment with three diets: a negative control (NC), a positive control (PC) and a dietary treatment (TRT). My preplanned orthogonal contrasts are NC vs PC, and (NC+PC)/2 vs TRT. My questions are:
    Would you test them even under a non-significant F-test for the main effect “diet”?
    Would you consider adjusting alpha for each contrast (alpha=1-(1-0.05)^(1/2))=0.0253?

    Thanks for your time.
    Irene

    Reply
    • Hello Irene,
      1. Theoretically, you don’t need to test them if there is a non-significant result. If, however, the assumptions for the main test are different from the follow-up test then you might get a significant result from the follow-up test even when the main test was not significant.
      2. Yes, you need to adjust for experimental error when using multiple contrasts.
      Charles

      Reply
  5. Hi, charles.
    I am running an experiment with 4 groups, 3 Treatment groups and 1 contol group. I would want to know if its appropriate for me to compare each group against the control using a students t test. Thank you

    Reply
  6. a researcher has rejected the null hypothesis that group means are equal and has concluded that the mean of the experimental group is significantly higher than that of the control group. in reality, however, the population means of the two groups are equal. the researcher has made which type of error? Type I or Experiment-wise?

    Reply
    • Type I error.
      Experiment-wise error occurs when you are assuming that the significance level is alpha equals say .05, but in reality it is higher (since you are performing multiple tests). In your example you could also have experiment-wise error if this occurs.
      Charles

      Reply
  7. I am running a one-way MANOVA, using one independent factor with two levels (groups) and 11 dependent variables. In SPSS, I am getting an error message that “post-hoc test are not performed for (each variable) because there are fewer than three groups.” Would you please elaborate on this? Also, the MANOVA SPSS instructions directed that the p value be set at (.05/11) = .0045. Please comment.
    Thank you

    Reply
    • Robert,
      I don’t use SPSS and so can’t comment on SPSS-specific issues, but here are my observations.
      1. With only two levels, perhaps the information that you are looking for are already available without the need for performing a post hoc test.
      2. .05/11 is a Bonferroni correction to handle experimentwise error.
      Charles

      Reply
  8. Dear Sir,
    Would you please introduce some references about the formula you mentioned? I mean α = 1 – (1 – .05)1/3. Thanks a lot.

    Reply
  9. Sir:

    If I run 10 t-tests with alpha set at .05 and there are no significant results (all p’s above .5), then should I even be concerned with experiment-wise error rate?

    Reply
  10. Hi Charles,

    I was wondering whether you could answer a few of my (relatively simple) questions:

    1.) How do post hocs influence the statistical decision for each pairwise comparison?
    2.) If one was to use multiple t-tests, what would the experiment wise error be?
    3.) What is the formula linking appropriate experiment wise error rate that is associated with each comparison? Is it: desired experiment wise error rate / number of pairwise comparisons?

    Any help is much appreciated!

    Reply
    • Jack,
      1. Don’t understand the question
      2. 1-(1-alpha)^k
      3. The error for each comparison is still alpha
      Charles

      Reply
    • If you use posthoc test and the test are significant. U can say that the specific pairwise that u use are different from each other

      Reply
  11. Dear Dr. Charles,

    I would appreciate to have your opinion about this problem.
    I have to statistically compare two foot pressure distribution maps, corresponding to two different clinical conditions, named A e B for instance.
    Each pressure map is composed by let’s say 100 sensor cells. Maps are the results of an average, so for each cell, I have a mean pressure value and related s.d.

    Then, what I need to do is to perform a comparison, (making 100 hundred of t-tests, one per each corresponding cell), between pressure value in condition A (mean and s.d.) and pressure value in condition B (mean and s.d.).

    My concern is: what is the correct significance level I have to use for each t-test? Can I set p=0.05 for each test, or should I apply some correction (e.g. Bonferroni) to take into account that I’m performing many comparisons?

    In effect, I am not interested to know if the whole foot in condition A is different from the whole foot in condition B, because in such a case I can understand that the Bonferroni correction on p-values would be mandatory, in order to keep a 5% experiment-wise type I error.
    Instead, the aim of my study is to investigate if there are statistic differences at the level of single cells, and this makes me confused about what is the right significance level p to apply to each t-test.

    Thank you very much for your help
    Piero

    Reply
    • Piero,
      Since you plan to conduct 100 tests, generally you should correct for experiment-wise type I error. This will impact the statistical power.
      Charles

      Reply
    • You’re going to want to use Tukey’s if you are looking at all possible pairwise comparisons. If you want to look at a few, then use bonferonni. Or if you have a control group and want to compare every other treatment to the control, using the Dunnett Correction.

      Reply
  12. Hi Charles,

    I am having a bit of trouble getting to grips with this and I was wondering if you could answer this question:
    if you fix the experimentwise error rate at 0.05. What effect does this
    have on the error rate of each comparison and how does this influence the statistical
    decision about each comparison?

    would it be that if you fixed it to 0.05 then the effect on each comparison would be that their error rates would be smaller, using the formula: 1 – (1 – .05)1/3 ? or have I got this completely wrong

    Any help on this would be much appreciated!

    Reply
    • You have got this right. If you fix the experimentwise error rate at 0.05, then this nets out to an alpha value of 1 – (1 – .05)1/3 = .016962 on each of the three tests to be conducted.
      Charles

      Reply
  13. Sir,
    Thanks for this site and package of yours; I’m learning a lot!
    With regards to this particular page about experiment wise error rate, you said just in the last paragraph that:
    “…in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952”

    Does it mean that the computed alpha (that is, 0.016952 for m=3 tests among k=4 samples) should be the one used in the pairwise test (m=3) to reduce the overall type I error among your 4 tests. If so, sir, what do you, statisticians, technically call this adjusted alpha?

    I’d be very glad to have your response.

    Reply
    • And I was also answered by your other page, in your discussion about the kruskal-wallis test. You said:

      “If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Tests. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni (alpha=0.05/m) or Dunn/Sidák correction (alpha=1-(1-0.05)^(1/3)).”

      This only means your page is very efficient, my sincerest appreciation, sir.

      Reply
    • Larry,
      Glad to see that you are learning a lot form the website. That’s great.
      The alpha value of 1 – (1 – .05)1/m depends on m, which is equal to the number of follow up tests you make. This is the alpha value you should use when you use contrasts (whether pairwise or not). Actually m = the number of orthogonal tests, and so if you restrict yourself to orthogonal tests then the maximum value of m is k – 1 (see Planned Follow-up Tests).
      I have always called the “adjusted alpha” simply “alpha”. If there is a technical term for this, I am unaware of it.
      Charles

      Reply

Leave a Comment