Assumptions for ANOVA

To use the ANOVA test we made the following assumptions:

  • Each group sample is drawn from a normally distributed population
  • All populations have a common variance
  • All samples are drawn independently of each other
  • Within each sample, the observations are sampled randomly and independently of each other
  • Factor effects are additive

The presence of outliers can also cause problems. In addition, we need to make sure that the F statistic is well behaved. In particular, the F statistic is relatively robust to violations of normality provided:

  • The populations are symmetrical and uni-modal.
  • The sample sizes for the groups are equal and greater than 10

In general, as long as the sample sizes are equal (called a balanced model) and sufficiently large, the normality assumption can be violated provided the samples are symmetrical or at least similar in shape (e.g. all are negatively skewed).

The F statistic is not so robust to violations of homogeneity of variances. A rule of thumb for balanced models is that if the ratio of the largest variance to smallest variance is less than 3 or 4, the F-test will be valid. If the sample sizes are unequal then smaller differences in variances can invalidate the F-test. Much more attention needs to be paid to unequal variances than to non-normality of data.

We now look at how to test for violations of these assumptions and how to deal with any violations when they occur.

175 thoughts on “Assumptions for ANOVA”

  1. Dear Charles

    I designed an experiment on driving posture preferences of people with different attributes for different car models (Sedan, SUVand MPV), with gender and height as between-group variables and car models as within-group variables; should this be done using repeated ANOVAs? But I also noticed that some researchers use ANOVA for repeated experiments, is there any theoretical support for this?

    Thank you in advance,best wish!

    Reply
    • It really depends on the details of the design.
      Car model could be a within-subjects factor in the case where someone owns more than one car.
      Charles

      Reply
      • You are right, in my experimental design, a single user experienced multiple test car models. In this case, should I only use repeated measures ANOVA?And,could you recommend me some books or papers on rational choice of statistical methods?

        With kindest regards!
        Chengmou Li

        Reply
        • You can use repeated measures ANOVA if the assumption for this approach are met.
          There are a number of books addressing this topic, many are references on the webpages of the Real Statistics website. Often these books use the term Design of Experiments in their title.
          Charles

          Reply
  2. Hi Charles,

    Does Central Limit Theorem play some role in the robustness of F statistic to the violation of normality?

    Can you explain it briefly?

    Thanks in advance!

    Reply
  3. Dear Charles,

    I’m currently applying ANOVA for a 2^7 factorial design. The normality assumption is however violated with p-value < 2.2e-16. Levene's test has a p-value of 0.05482. Can I still work with ANOVA?

    Kind regards,
    Elisabeth

    Reply
    • Hello Elisabeth,
      In general, you should be more concerned about Levene’s test than the normality assumption. From your results, Levene’s test is borderline, but the normality test is very poor. I would check to see whether an outlier is distorting the normality test. I would also consider using Kruskal-Wallis.
      Charles

      Reply
  4. Hi, Charles!
    I’m running a series (10) of 3 x 2 ANOVAs. I’ve got around 2000 P’s, however, group sizes are very unequal. Firstly, the assumption of normality (Shapiro-Wilk) was breached for all outcome variables at each level of both IV’s. Blanca, et al., (2017) indicates that ANOVA is robust in all instances of non-normality (homogeneity assumed) they tested (i.e., up to skewness = 2 and kurtosis = 6).
    This would mean that 9/10 of my outcome variables should be fine. However, I’m not sure how to deal with the last one. The other issue is that this research specifies that homogeneity of variance is assumed, in my instance five variables violated this assumption if going off the mean-based test, and three violated it going off the median-based test (which may be better to interpret when data is not normal).

    I’m a little lost at how I should proceed given my various violations (and unequal group sizes). Any help is much appreciated!

    Reply
  5. Dears,

    I’ve done research with one questionnaire while I was also observing some additional characteristics of participants. For example, I was observing educational degrees which had three categories (bachelor, master, doctoral). Now I want to compare those three categories for the questionnaire total score, but I have a big difference in the number of participants in each of the categories (bachelor N=54, master N=117, doctoral N=14). How can I know if that difference in the number of participants between categories is ok so I can do a further analysis?

    Thank you in advance!

    Nina

    Reply
    • Hello Nima,
      You can perform ANOVA even with group sample sizes that are quite different, however, you need to be aware of the following:
      1. The power of the test will be reduced, i.e. lesser ability to detect small differences in effect size
      2. The test is less robust to violations of the homogeneity of variances assumption. This might lead you to use Welsh’s ANOVA instead.
      Charles

      Reply
      • Thank you very much for answering fast!

        I’ve applied Leven’s test and it showed significance >.05.
        Does it mean that I can proceed with “regular” ANOVA?
        I’ve never used that Welsh’s Anova.

        Reply
  6. Hello Sir, what will be the effect of violating all the assumption to the comparison wise and experimental wise error rate in post-hoc

    Reply
    • Assuming that you are willing to accept a 5% type I error (the usual assumption), and you find that a test shows that p-value = .03, then you conclude that you have a significant result (one that exceeds the 5% = .05 threshold). If instead, the test shows that p-value = .10, then you conclude that you don’t have a significant result (i.e. your results are consistent with the null hypothesis).
      Now suppose that a p-value of .03 is really a p-value of .10 (or that a significance level of 5% is really 15%), now you need to adjust your thinking about the test. This sort of situation potentially arises when a test assumption is not met, and so you may reach the wrong conclusion. Violating some assumptions is riskier than others (e.g. minor violations of normality is usually less of a problem than violations of homogeneity of variances).
      Charles

      Reply
  7. Hello! We have 50 subjects and each of them has multiple measurements of a variable, X, in three different conditions. the three conditions are low, medium and high. We want to evaluate whether there are differences between the means of X of the three conditions. We assume that we can’t use ANOVA because our observations are not independent. Not only do we have observations of the same subject in the three conditions/groups, but we also have multiple observations of the same subject in each condition/group. Is there an alternative to ANOVA that we can use? I believe that we can’t use repeated measures ANOVA either, because we have multiple meaurements of each subject in each group (and not the same number in each group).

    Reply
    • Manos,
      If for each subject you had one measure for each of the 3 conditions, then you could use repeated measures ANOVA or one-factor MANOVA. As you have observed since you have multiple measurements for the same subject/group, you can’t use either of these approaches nor any of the designs described on the website.
      I don’t know what analysis you can use, although I have the following suggested approach that might be appropriate:
      1. For any subject/group for which you have a duplicate use the mean of all the duplicate entries. Then you can use repeated measures ANOVA or MANOVA
      2. For any subject/group for which you have a duplicate randomly select one of the duplicate entries.
      Charles

      Reply
  8. If my data follows more a sigmoid function over time (time series data) can I still apply ANOVA or what other test would you reccomend?

    Reply
    • I need additional information before I would be able to address your question. Can you provide the following information?
      1. Describe the scenario that you are looking at. Include the nature of the data.
      2. What hypotheses are you trying to test?
      Charles

      Reply
  9. Hello ! I have non-normal data that I would have liked to analyze using a 2-way repeated measure ANOVA (two groups with measurements at 2 time points). I tried transformation (sqrt, ln, log, box-cox), and data stay non-normal. I can’t find an appropriate non-parametric test! What do you suggest?

    Also, my sample size is small, 15 per group. I’ve heard that if the homogeinity of data, I can still do my ANOVA. Is this true? Thank you !!

    Reply
    • What to do depends on what hypothesis you want to test. E.g. you could perform a two-sample t-test using the differences between the measurements at the two time periods for each subject. This will test whether there is a significant difference between the two populations from which the samples are drawn based on the change in the measurements between the two time periods. If the set of these differences is normally distributed then the t-test could be the way to go. If not, you could use the Mann-Whitney non-parametric test.
      Other tests are possible depending on how far from normality the appropriate data values are. The devil is in the details. You might also be about to use resampling even if the data is not normally distributed.
      Charles

      Reply
    • This depends on what data are missing and what type of ANOVA you want to perform. E.g. if you have 3 groups each containing 10 elements and one of the groups is missing one of the elements, you can still perform one-way ANOVA and the results should still be valid provided the missing element is missing at random (e.g. the value was obtained but it is unreadable or the measurement was not obtained because the missing data was from a person who missed the bus and so a value for that person couldn’t be obtained, etc.).
      Can you provide some additional information about the type of ANOVA you want to perform and how much of the data are missing and the nature of the missing data.
      Charles

      Reply
    • If by disturbance term you mean the residuals, then normality is essential for correctly interpreting ANOVA. You can certainly perform the test even if the normality assumption doesn’t hold but your conclusions may be incorrect. Fortunately, ANOVA is pretty forgiving about this assumption not holding, but it the data is too far from normality, you can have problems.
      Charles

      Reply
  10. I performed the ANOVA. My results showed a level of heterogeneity with unequal group sample sizes (36, 31, 25). What would you suggest in this case?

    Thank you

    Reply
  11. I performed a Shapiro Wilk test for my data and some of my groups did not meet the requirements for a normally distributed population. For example, I have 7 sample groups and only 4 of them had a normal distribution. Another example is where I had 12 sample groups and only 9 of them had a normal distribution. Is it required for ALL groups to be normally distributed to perform an ANOVA or is a simple majority of the groups being normally distributed sufficient?

    Reply
    • Usually, the value from ANOVA is a t statistic or F statistic and not a statistic for the normal distribution.
      In any case, a z-score (the statistic for the normal distribution) is one point on the normal probability curve.
      Charles

      Reply

Leave a Comment