Basic Concepts
Suppose that instead of performing one statistical test, we perform three such tests; e.g. three tests with the null hypotheses:
- H0: μ1 = c1
- H0: μ2 = c2
- H0: μ3 = c3
Note that if you use a significance level of α = .05 for each of the three analyses then the overall significance level is .14 since 1 – (1 – α)3 = 1 – (1 – .05)3 = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting at least one of the null hypotheses even when they are all true (i.e. a type I error) is 14.2525%. This value is based on a worst-case scenario where each of the null hypotheses is independent of the others.
In general, if you perform k tests and you don’t want a type I error in any of the tests, then the combined type I error becomes 1 – (1 – α)k. If you perform a lot of tests, then this value becomes very high. E.g. for 10 tests, 1 – (1 – .05)10 = .40 and for 100 tests it is .994.
The situation is similar when you test the null hypothesis
H0: μ1 = μ2 = μ3
We can use the following three separate null hypotheses:
- H0: μ1 = μ2
- H0: μ2 = μ3
- H0: μ1 = μ3
If any of these null hypotheses is rejected then the original null hypothesis is rejected.
For k such populations, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)m, a value which would get progressively higher as the number of populations increases. For example, if k = 6, then m = 15 and the probability of finding at least one significant test, purely by chance, even when the null hypothesis is true is over 50%.
Familywise error
With three separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or familywise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952. As mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests.
Dealing with familywise error
There are various ways of dealing with familywise error when conducting multiple tests, including the following approaches that are appropriate for any type of multiple-test situation:
- Bonferroni and Dunn-Sidàk tests
- Holm’s and Hochberg’s tests
- Benjamini-Hochberg and Benjamini-Yekutieli tests
- Real Statistics data analysis tool in support of these tests
There are also specialized approaches that are discussed elsewhere on the website about multiple t-tests (especially following ANOVA) or multiple non-parametric tests such as the Mann-Whitney and Wilcoxon Signed-ranks test.
Reference
Wikipedia (2018) False discovery rate
https://en.wikipedia.org/wiki/False_discovery_rate