We could have conducted the analysis for Example 1 of Basic Concepts for ANOVA by conducting multiple two sample tests. E.g. to decide whether or not to reject the following null hypothesis
H0: μ1 = μ2 = μ3
We can use the following three separate null hypotheses:
- H0: μ1 = μ2
- H0: μ2 = μ3
- H0: μ1 = μ3
If any of these null hypotheses is rejected then the original null hypothesis is rejected.
Real alpha value
Note however that if you set α = .05 for each of the three sub-analyses then, assuming independence, the overall alpha value is .14 since 1 – (1 – α)3 = 1 – (1 – .05)3 = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting the null hypothesis even when it is true (type I error) is 14.2525%.
For k groups, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)m, a value which would get progressively higher as the number of samples increases. For example, if k = 6, then m = 15 and the probability of finding at least one significant t-test, purely by chance, even when the null hypothesis is true is over 50%.
In fact, one of the reasons for performing ANOVA instead of separate t-tests is to reduce the type I error. The only problem is that once you have performed ANOVA if the null hypothesis is rejected you will naturally want to determine which groups have unequal variance, and so you will need to confront this issue in any case.
With 3 separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952. As is mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests. If the experiment-wise error rate < .05 then the error rate is called conservative. If it is > .05 then the error rate is called liberal.
Planned vs. unplanned post-hoc tests
There are two types of follow up tests following ANOVA: planned (aka a priori) and unplanned (aka post hoc or posteriori) tests. Planned tests are determined prior to the collection of data, while unplanned tests are made after data is collected. These tests have entirely different type I error rates.
For example, suppose there are 4 groups. If an alpha value of .05 is used for a planned test of the null hypothesis then the type I error rate will be .05. If instead the experimenter collects the data and sees means for the 4 groups of 2, 4, 9 and 7, then the same test will have a type I error rate of more than .05. The reason for this is that once the experimenter sees the data, he will choose to test because μ1 and μ2are the smallest means and μ3and μ4 are the largest.
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Wikipedia (2012) Family-wise error rate
https://en.wikipedia.org/wiki/Family-wise_error_rate
Leday, G., Hemerik, J., Engel, J., van der Voet, H. (2023) Improved family-wise error rate control in multiple equivalence testing
https://www.sciencedirect.com/science/article/pii/S0278691523003307
Congratulations for the explanation on this page!
What do you think about why there are still researchers who insist on using tests that only control for type I error rates by comparison?
thanks
Hello André,
Thank you for your kind words.
Without controlling for type I error, the test results are not valid.
Charles
Am glad to have come across ds site… Sir kindly assist me on how to determine comparison wise error rate when i have five treatments and the number of comparison is ten, my level of significance is 0.05 and 4 of the comparison are below the significant level, how do i determine the error rate.
Bukar,
If you run 10 separate comparison tests, then experiment-wise error rate is 1-(1-.05)^10. This is based on the assumption that all the comparisons are independent of each other.
Charles
Hello Sir,
I would like to ask you a question: I have three samples and four tests I plan to conduct. I have decided to apply Bonferroni correction with a new alpha as 0.05/4. Suppose I find two statistical significances and I plan to make pairwise comparisons for these two tests, should I have to apply another Bonferroni correction? Would it be 0.05/(3*3)?
Carla,
Before looking at the data or the results of the tests, do you plan to conduct a total of 4 tests? In that case, the corrected alpha is .05/4. If instead, you plan to run 4 tests for each of the 3 samples, then the corrected alpha is .05/12 (unless what you plan to do for each sample is independent of what you do for the other two samples). You need to use a divisor based on the maximum number of tests; you can’t into account information that you have obtained after running some of the tests (such as seeing which of these tests are significant). This is the general situation. I would need more details about what you are trying to do to get a more definitive response.
The corrected alpha can get quite small. This is why there are a number of special post-hoc tests (such as Tukey HSD after ANOVA) that address experiment-wise error better than Bonferroni. But these post-hoc tests depend on the details of the analysis that you are trying to perform.
Charles
Charles
Thank you for your answer. Let me explain.
I have three groups (levels of the independent variable) that I want to test on 4 different categorical variables. I plan to use the Chi square test of independence, so I thought to use Bonferroni and set alpha as 0.05/4. If, after using the correction, I get a significant result from at least one test I plan to conduct a post-hoc analysis by using again Chi square and doing a pairwise comparison of the three groups (1 vs 2, 1 vs 3, 2 vs 3). In this case should I have to make another correction considering the total number of possible comparisons? For example, if after these 4 Chi square tests I have just one statistically significant result, should I set my new alpha at 0.05/3 and do these comparisons? What if I have two significant tests? Then would alpha be 0.05/9? I apologise for these silly questions.
Dear Dr. Charles
I have an experiment with three diets: a negative control (NC), a positive control (PC) and a dietary treatment (TRT). My preplanned orthogonal contrasts are NC vs PC, and (NC+PC)/2 vs TRT. My questions are:
Would you test them even under a non-significant F-test for the main effect “diet”?
Would you consider adjusting alpha for each contrast (alpha=1-(1-0.05)^(1/2))=0.0253?
Thanks for your time.
Irene
Hello Irene,
1. Theoretically, you don’t need to test them if there is a non-significant result. If, however, the assumptions for the main test are different from the follow-up test then you might get a significant result from the follow-up test even when the main test was not significant.
2. Yes, you need to adjust for experimental error when using multiple contrasts.
Charles
Thank you very much
Hi, charles.
I am running an experiment with 4 groups, 3 Treatment groups and 1 contol group. I would want to know if its appropriate for me to compare each group against the control using a students t test. Thank you
Yusuf,
You can, but that would increase the experiment-wise error. You could use Dunnett’s test instead. See Unplanned comparisons.
Charles
a researcher has rejected the null hypothesis that group means are equal and has concluded that the mean of the experimental group is significantly higher than that of the control group. in reality, however, the population means of the two groups are equal. the researcher has made which type of error? Type I or Experiment-wise?
Type I error.
Experiment-wise error occurs when you are assuming that the significance level is alpha equals say .05, but in reality it is higher (since you are performing multiple tests). In your example you could also have experiment-wise error if this occurs.
Charles
I am running a one-way MANOVA, using one independent factor with two levels (groups) and 11 dependent variables. In SPSS, I am getting an error message that “post-hoc test are not performed for (each variable) because there are fewer than three groups.” Would you please elaborate on this? Also, the MANOVA SPSS instructions directed that the p value be set at (.05/11) = .0045. Please comment.
Thank you
Robert,
I don’t use SPSS and so can’t comment on SPSS-specific issues, but here are my observations.
1. With only two levels, perhaps the information that you are looking for are already available without the need for performing a post hoc test.
2. .05/11 is a Bonferroni correction to handle experimentwise error.
Charles
Dear Sir,
Would you please introduce some references about the formula you mentioned? I mean α = 1 – (1 – .05)1/3. Thanks a lot.
Zahra,
The derivation of this formula is shown on the referenced webpage.
Charles
Sir:
If I run 10 t-tests with alpha set at .05 and there are no significant results (all p’s above .5), then should I even be concerned with experiment-wise error rate?
Victor,
I believe that in this case you don’t need to be concerned about experiment-wise error (assuming I haven’t made a silly logic mistake).
Charles
Hi Charles,
I was wondering whether you could answer a few of my (relatively simple) questions:
1.) How do post hocs influence the statistical decision for each pairwise comparison?
2.) If one was to use multiple t-tests, what would the experiment wise error be?
3.) What is the formula linking appropriate experiment wise error rate that is associated with each comparison? Is it: desired experiment wise error rate / number of pairwise comparisons?
Any help is much appreciated!
Jack,
1. Don’t understand the question
2. 1-(1-alpha)^k
3. The error for each comparison is still alpha
Charles
If you use posthoc test and the test are significant. U can say that the specific pairwise that u use are different from each other
Dear Dr. Charles,
I would appreciate to have your opinion about this problem.
I have to statistically compare two foot pressure distribution maps, corresponding to two different clinical conditions, named A e B for instance.
Each pressure map is composed by let’s say 100 sensor cells. Maps are the results of an average, so for each cell, I have a mean pressure value and related s.d.
Then, what I need to do is to perform a comparison, (making 100 hundred of t-tests, one per each corresponding cell), between pressure value in condition A (mean and s.d.) and pressure value in condition B (mean and s.d.).
My concern is: what is the correct significance level I have to use for each t-test? Can I set p=0.05 for each test, or should I apply some correction (e.g. Bonferroni) to take into account that I’m performing many comparisons?
In effect, I am not interested to know if the whole foot in condition A is different from the whole foot in condition B, because in such a case I can understand that the Bonferroni correction on p-values would be mandatory, in order to keep a 5% experiment-wise type I error.
Instead, the aim of my study is to investigate if there are statistic differences at the level of single cells, and this makes me confused about what is the right significance level p to apply to each t-test.
Thank you very much for your help
Piero
Piero,
Since you plan to conduct 100 tests, generally you should correct for experiment-wise type I error. This will impact the statistical power.
Charles
Could you write about Phciyss so I can pass Science class?
You’re going to want to use Tukey’s if you are looking at all possible pairwise comparisons. If you want to look at a few, then use bonferonni. Or if you have a control group and want to compare every other treatment to the control, using the Dunnett Correction.
Hi Charles,
I am having a bit of trouble getting to grips with this and I was wondering if you could answer this question:
if you fix the experimentwise error rate at 0.05. What effect does this
have on the error rate of each comparison and how does this influence the statistical
decision about each comparison?
would it be that if you fixed it to 0.05 then the effect on each comparison would be that their error rates would be smaller, using the formula: 1 – (1 – .05)1/3 ? or have I got this completely wrong
Any help on this would be much appreciated!
You have got this right. If you fix the experimentwise error rate at 0.05, then this nets out to an alpha value of 1 – (1 – .05)1/3 = .016962 on each of the three tests to be conducted.
Charles
Thanks for this site and package of yours; I’m learning a lot!
Sir,
Thanks for this site and package of yours; I’m learning a lot!
With regards to this particular page about experiment wise error rate, you said just in the last paragraph that:
“…in order to achieve a combined type I error rate (called an experiment-wise error rate or family-wise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952”
Does it mean that the computed alpha (that is, 0.016952 for m=3 tests among k=4 samples) should be the one used in the pairwise test (m=3) to reduce the overall type I error among your 4 tests. If so, sir, what do you, statisticians, technically call this adjusted alpha?
I’d be very glad to have your response.
And I was also answered by your other page, in your discussion about the kruskal-wallis test. You said:
“If the Kruskal-Wallis Test shows a significant difference between the groups, then pairwise comparisons can be used by employing the Mann-Whitney U Tests. As described in Experiment-wise Error Rate and Planned Comparisons for ANOVA, it is important to reduce experiment-wise Type I error by using a Bonferroni (alpha=0.05/m) or Dunn/Sidák correction (alpha=1-(1-0.05)^(1/3)).”
This only means your page is very efficient, my sincerest appreciation, sir.
Larry,
Glad to see that you are learning a lot form the website. That’s great.
The alpha value of 1 – (1 – .05)1/m depends on m, which is equal to the number of follow up tests you make. This is the alpha value you should use when you use contrasts (whether pairwise or not). Actually m = the number of orthogonal tests, and so if you restrict yourself to orthogonal tests then the maximum value of m is k – 1 (see Planned Follow-up Tests).
I have always called the “adjusted alpha” simply “alpha”. If there is a technical term for this, I am unaware of it.
Charles