Experiment-wise error rate

We could have conducted the analysis for Example 1 of Basic Concepts for ANOVA by conducting multiple two sample tests. E.g. to decide whether or not to reject the following null hypothesis

H₀: μ₁ = μ₂ = μ₃

We can use the following three separate null hypotheses:

H₀: μ₁ = μ₂
H₀: μ₂ = μ₃
H₀: μ₁ = μ₃

If any of these null hypotheses is rejected then the original null hypothesis is rejected.

Real alpha value

Note however that if you set α = .05 for each of the three sub-analyses then, assuming independence, the overall alpha value is .14 since 1 – (1 – α)³= 1 – (1 – .05)³ = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting the null hypothesis even when it is true (type I error) is 14.2525%.

For k groups, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)^m, a value which would get progressively higher as the number of samples increases. For example, if k = 6, then m = 15 and the probability of finding at least one significant t-test, purely by chance, even when the null hypothesis is true is over 50%.

In fact, one of the reasons for performing ANOVA instead of separate t-tests is to reduce the type I error. The only problem is that once you have performed ANOVA if the null hypothesis is rejected you will naturally want to determine which groups have unequal variance, and so you will need to confront this issue in any case.

With 3 separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)³ = .05, i.e. α = 1 – (1 – .05)^1/3= 0.016952. As is mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests. If the experiment-wise error rate < .05 then the error rate is called conservative. If it is > .05 then the error rate is called liberal.

Planned vs. unplanned post-hoc tests

There are two types of follow up tests following ANOVA: planned (aka a priori) and unplanned (aka post hoc or posteriori) tests. Planned tests are determined prior to the collection of data, while unplanned tests are made after data is collected. These tests have entirely different type I error rates.

For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis $\frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2}$ then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test $\frac{\mu_1 + \mu_2}{2} = \frac{\mu_3 + \mu_4}{2}$ because μ₁ and μ₂are the smallest means and μ₃and μ₄ are the largest.

References

Howell, D. C. (2010) Statistical methods for psychology (7^th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Wikipedia (2012) Family-wise error rate
https://en.wikipedia.org/wiki/Family-wise_error_rate

34 thoughts on “Experiment-wise error rate”

André Carvalho

April 18, 2022 at 2:40 am

Congratulations for the explanation on this page!
What do you think about why there are still researchers who insist on using tests that only control for type I error rates by comparison?
thanks
Reply
- Charles
  
  April 18, 2022 at 6:55 am
  
  Hello André,
  Thank you for your kind words.
  Without controlling for type I error, the test results are not valid.
  Charles
  Reply
bukar

March 20, 2021 at 4:12 pm

Am glad to have come across ds site… Sir kindly assist me on how to determine comparison wise error rate when i have five treatments and the number of comparison is ten, my level of significance is 0.05 and 4 of the comparison are below the significant level, how do i determine the error rate.
Reply
- Charles
  
  March 21, 2021 at 8:56 am
  
  Bukar,
  If you run 10 separate comparison tests, then experiment-wise error rate is 1-(1-.05)^10. This is based on the assumption that all the comparisons are independent of each other.
  Charles
  Reply
Carla

December 9, 2020 at 3:02 pm

Hello Sir,
I would like to ask you a question: I have three samples and four tests I plan to conduct. I have decided to apply Bonferroni correction with a new alpha as 0.05/4. Suppose I find two statistical significances and I plan to make pairwise comparisons for these two tests, should I have to apply another Bonferroni correction? Would it be 0.05/(3*3)?
Reply
- Charles
  
  December 9, 2020 at 4:10 pm
  
  Carla,
  Before looking at the data or the results of the tests, do you plan to conduct a total of 4 tests? In that case, the corrected alpha is .05/4. If instead, you plan to run 4 tests for each of the 3 samples, then the corrected alpha is .05/12 (unless what you plan to do for each sample is independent of what you do for the other two samples). You need to use a divisor based on the maximum number of tests; you can’t into account information that you have obtained after running some of the tests (such as seeing which of these tests are significant). This is the general situation. I would need more details about what you are trying to do to get a more definitive response.
  The corrected alpha can get quite small. This is why there are a number of special post-hoc tests (such as Tukey HSD after ANOVA) that address experiment-wise error better than Bonferroni. But these post-hoc tests depend on the details of the analysis that you are trying to perform.
  Charles
  Charles
  Reply
  - Carla
    
    December 9, 2020 at 5:12 pm
    
    Thank you for your answer. Let me explain.
    I have three groups (levels of the independent variable) that I want to test on 4 different categorical variables. I plan to use the Chi square test of independence, so I thought to use Bonferroni and set alpha as 0.05/4. If, after using the correction, I get a significant result from at least one test I plan to conduct a post-hoc analysis by using again Chi square and doing a pairwise comparison of the three groups (1 vs 2, 1 vs 3, 2 vs 3). In this case should I have to make another correction considering the total number of possible comparisons? For example, if after these 4 Chi square tests I have just one statistically significant result, should I set my new alpha at 0.05/3 and do these comparisons? What if I have two significant tests? Then would alpha be 0.05/9? I apologise for these silly questions.
    Reply
Irene Ceconi

March 14, 2019 at 5:13 pm

Dear Dr. Charles

I have an experiment with three diets: a negative control (NC), a positive control (PC) and a dietary treatment (TRT). My preplanned orthogonal contrasts are NC vs PC, and (NC+PC)/2 vs TRT. My questions are:
Would you test them even under a non-significant F-test for the main effect “diet”?
Would you consider adjusting alpha for each contrast (alpha=1-(1-0.05)^(1/2))=0.0253?

Thanks for your time.
Irene
Reply
- Charles
  
  March 14, 2019 at 6:02 pm
  
  Hello Irene,
  1. Theoretically, you don’t need to test them if there is a non-significant result. If, however, the assumptions for the main test are different from the follow-up test then you might get a significant result from the follow-up test even when the main test was not significant.
  2. Yes, you need to adjust for experimental error when using multiple contrasts.
  Charles
  Reply
  - Irene Ceconi
    
    March 14, 2019 at 6:36 pm
    
    Thank you very much
    Reply
Yusuf

October 21, 2018 at 12:34 am

Hi, charles.
I am running an experiment with 4 groups, 3 Treatment groups and 1 contol group. I would want to know if its appropriate for me to compare each group against the control using a students t test. Thank you
Reply
- Charles
  
  October 21, 2018 at 11:52 am
  
  Yusuf,
  You can, but that would increase the experiment-wise error. You could use Dunnett’s test instead. See Unplanned comparisons.
  Charles
  Reply
CasM

July 25, 2018 at 9:59 pm

a researcher has rejected the null hypothesis that group means are equal and has concluded that the mean of the experimental group is significantly higher than that of the control group. in reality, however, the population means of the two groups are equal. the researcher has made which type of error? Type I or Experiment-wise?
Reply
- Charles
  
  July 25, 2018 at 10:48 pm
  
  Type I error.
  Experiment-wise error occurs when you are assuming that the significance level is alpha equals say .05, but in reality it is higher (since you are performing multiple tests). In your example you could also have experiment-wise error if this occurs.
  Charles
  Reply
Pingback: Experiment Wise Error
Robert Darrow

November 3, 2017 at 6:56 am

I am running a one-way MANOVA, using one independent factor with two levels (groups) and 11 dependent variables. In SPSS, I am getting an error message that “post-hoc test are not performed for (each variable) because there are fewer than three groups.” Would you please elaborate on this? Also, the MANOVA SPSS instructions directed that the p value be set at (.05/11) = .0045. Please comment.
Thank you
Reply
- Charles
  
  November 3, 2017 at 8:17 am
  
  Robert,
  I don’t use SPSS and so can’t comment on SPSS-specific issues, but here are my observations.
  1. With only two levels, perhaps the information that you are looking for are already available without the need for performing a post hoc test.
  2. .05/11 is a Bonferroni correction to handle experimentwise error.
  Charles
  Reply
Zahra

October 15, 2017 at 10:46 am

Dear Sir,
Would you please introduce some references about the formula you mentioned? I mean α = 1 – (1 – .05)1/3. Thanks a lot.
Reply
- Charles
  
  October 15, 2017 at 9:02 pm
  
  Zahra,
  The derivation of this formula is shown on the referenced webpage.
  Charles
  Reply
victor

February 16, 2017 at 7:43 pm

Sir:

If I run 10 t-tests with alpha set at .05 and there are no significant results (all p’s above .5), then should I even be concerned with experiment-wise error rate?
Reply
- Charles
  
  February 16, 2017 at 9:59 pm
  
  Victor,
  I believe that in this case you don’t need to be concerned about experiment-wise error (assuming I haven’t made a silly logic mistake).
  Charles
  Reply
Jack

May 7, 2016 at 10:44 am

Hi Charles,

I was wondering whether you could answer a few of my (relatively simple) questions:

1.) How do post hocs influence the statistical decision for each pairwise comparison?
2.) If one was to use multiple t-tests, what would the experiment wise error be?
3.) What is the formula linking appropriate experiment wise error rate that is associated with each comparison? Is it: desired experiment wise error rate / number of pairwise comparisons?

Any help is much appreciated!
Reply
- Charles
  
  May 10, 2016 at 8:11 pm
  
  Jack,
  1. Don’t understand the question
  2. 1-(1-alpha)^k
  3. The error for each comparison is still alpha
  Charles
  Reply
- Aa
  
  April 28, 2017 at 5:55 pm
  
  If you use posthoc test and the test are significant. U can say that the specific pairwise that u use are different from each other
  Reply
Piero

November 13, 2015 at 5:09 pm

Dear Dr. Charles,

I would appreciate to have your opinion about this problem.
I have to statistically compare two foot pressure distribution maps, corresponding to two different clinical conditions, named A e B for instance.
Each pressure map is composed by let’s say 100 sensor cells. Maps are the results of an average, so for each cell, I have a mean pressure value and related s.d.

Then, what I need to do is to perform a comparison, (making 100 hundred of t-tests, one per each corresponding cell), between pressure value in condition A (mean and s.d.) and pressure value in condition B (mean and s.d.).

My concern is: what is the correct significance level I have to use for each t-test? Can I set p=0.05 for each test, or should I apply some correction (e.g. Bonferroni) to take into account that I’m performing many comparisons?

In effect, I am not interested to know if the whole foot in condition A is different from the whole foot in condition B, because in such a case I can understand that the Bonferroni correction on p-values would be mandatory, in order to keep a 5% experiment-wise type I error.
Instead, the aim of my study is to investigate if there are statistic differences at the level of single cells, and this makes me confused about what is the right significance level p to apply to each t-test.

Thank you very much for your help
Piero
Reply
- Charles
  
  November 17, 2015 at 9:30 pm
  
  Piero,
  Since you plan to conduct 100 tests, generally you should correct for experiment-wise type I error. This will impact the statistical power.
  Charles
  Reply
  - Rusty
    
    February 9, 2016 at 5:35 pm
    
    Could you write about Phciyss so I can pass Science class?
    Reply
- Tyler Kelemen
  
  February 24, 2016 at 10:51 pm
  
  You’re going to want to use Tukey’s if you are looking at all possible pairwise comparisons. If you want to look at a few, then use bonferonni. Or if you have a control group and want to compare every other treatment to the control, using the Dunnett Correction.
  Reply
Rosie

April 14, 2015 at 11:45 pm

Hi Charles,

I am having a bit of trouble getting to grips with this and I was wondering if you could answer this question:
if you fix the experimentwise error rate at 0.05. What effect does this
have on the error rate of each comparison and how does this influence the statistical
decision about each comparison?

would it be that if you fixed it to 0.05 then the effect on each comparison would be that their error rates would be smaller, using the formula: 1 – (1 – .05)1/3 ? or have I got this completely wrong

Any help on this would be much appreciated!
Reply
- Charles
  
  April 15, 2015 at 7:38 am
  
  You have got this right. If you fix the experimentwise error rate at 0.05, then this nets out to an alpha value of 1 – (1 – .05)1/3 = .016962 on each of the three tests to be conducted.
  Charles
  Reply
Tamer Helal

April 11, 2015 at 10:26 am

Thanks for this site and package of yours; I’m learning a lot!
Reply
Larry Bernardo

February 24, 2015 at 7:47 am

Sir,
Thanks for this site and package of yours; I’m learning a lot!
With regards to this particular page about experiment wise error rate, you said just in the last paragraph that:
“…in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952”

Does it mean that the computed alpha (that is, 0.016952 for m=3 tests among k=4 samples) should be the one used in the pairwise test (m=3) to reduce the overall type I error among your 4 tests. If so, sir, what do you, statisticians, technically call this adjusted alpha?

I’d be very glad to have your response.
Reply
- Larry Bernardo
  
  February 24, 2015 at 8:02 am
  
  And I was also answered by your other page, in your discussion about the kruskal-wallis test. You said:
  
  “If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Tests. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni (alpha=0.05/m) or Dunn/Sidák correction (alpha=1-(1-0.05)^(1/3)).”
  
  This only means your page is very efficient, my sincerest appreciation, sir.
  Reply
- Charles
  
  February 24, 2015 at 11:59 am
  
  Larry,
  Glad to see that you are learning a lot form the website. That’s great.
  The alpha value of 1 – (1 – .05)1/m depends on m, which is equal to the number of follow up tests you make. This is the alpha value you should use when you use contrasts (whether pairwise or not). Actually m = the number of orthogonal tests, and so if you restrict yourself to orthogonal tests then the maximum value of m is k – 1 (see Planned Follow-up Tests).
  I have always called the “adjusted alpha” simply “alpha”. If there is a technical term for this, I am unaware of it.
  Charles
  Reply

Real alpha value

Planned vs. unplanned post-hoc tests

References

34 thoughts on “Experiment-wise error rate”

Leave a Comment Cancel reply