Dealing with Familywise Error

Basic Concepts

Suppose that instead of performing one statistical test, we perform three such tests; e.g. three tests with the null hypotheses:

  • H0μ1 = μ2
  • H0μ2 = μ3
  • H0μ1 = μ3

Note that if you use a significance level of α = .05 for each of the three analyses then the overall significance level is .14 since 1 – (1 – α)3 = 1 – (1 – .05)3 = 0.142525 (see Example 6 of Basic Probability Concepts). This means that the probability of rejecting at least one of the null hypotheses even when they are all true (i.e. a type I error) is 14.2525%. This value is based on a worst-case scenario where each of the null hypotheses is independent of the others.

In general, if you perform k tests and you don’t want a type I error in any of the tests, then the combined type I error becomes 1 – (1 – α)k. If you perform a lot of tests, then this value becomes very high. E.g. for 10 tests, 1 – (1 – .05)10 = .40 and for 100 tests it is .994.

The situation is similar when you test the null hypothesis

H0μ1 = μ2 = μ3

We can use the following three separate null hypotheses:

  • H0μ1 = μ2
  • H0μ2 = μ3
  • H0μ1 = μ3

If any of these null hypotheses is rejected then the original null hypothesis is rejected.

For k such populations, you would need to run m = COMBIN(k, 2) such tests and so the resulting overall alpha would be 1 – (1 – α)m, a value which would get progressively higher as the number of populations increases. For example, if k = 6, then = 15 and the probability of finding at least one significant test, purely by chance, even when the null hypothesis is true is over 50%.

Familywise error

With three separate tests, in order to achieve a combined type I error rate (called an experiment-wise error rate or familywise error rate) of .05 you would need to set each alpha to a value such that 1 – (1 – α)3 = .05, i.e. α = 1 – (1 – .05)1/3 = 0.016952. As mentioned in Statistical Power, for the same sample size this reduces the power of the individual t-tests.

Dealing with familywise error

There are various ways of dealing with familywise error when conducting multiple tests, including the following approaches that are appropriate for any type of multiple-test situation:

There are also specialized approaches that are discussed elsewhere on the website about multiple t-tests (especially following ANOVA) or multiple non-parametric tests such as the Mann-Whitney and Wilcoxon Signed-ranks test.

Reference

Wikipedia (2018) False discovery rate
https://en.wikipedia.org/wiki/False_discovery_rate

Leave a Comment