Basic Concepts
To use the ANOVA test we made the following assumptions:
- The residuals are normally distributed
- Group populations have a common variance
- All samples are drawn independently of each other
- Within each sample, the observations are sampled randomly and independently of each other
- Factor effects are additive
The presence of outliers can also cause problems. In addition, we need to make sure that the F statistic is well behaved. In particular, the F statistic is relatively robust to violations of normality provided:
- The populations are symmetrical and uni-modal.
- The sample sizes for the groups are equal and greater than 10
Priorities
In general, as long as the sample sizes are equal (called a balanced model) and sufficiently large, the normality assumption can be violated provided the samples are symmetrical or at least similar in shape (e.g. all are negatively skewed).
The F statistic is not so robust to violations of homogeneity of variances. A rule of thumb for balanced models is that if the ratio of the largest variance to smallest variance is less than 3 or 4, the F-test will be valid. If the sample sizes are unequal then smaller differences in variances can invalidate the F-test. Much more attention needs to be paid to unequal variances than to non-normality of data.
Further Information
We now look at how to test for violations of these assumptions and how to deal with any violations when they occur.
- Defining ANOVA residuals and determining whether they are normally distributed (see Normality of ANOVA Residuals)
- Testing that the population is normally distributed (see Testing for Normality and Symmetry)
- Determining whether the homogeneity of variances assumption is met and dealing with violations (see Homogeneity of Variances)
- Testing for and dealing with outliers (see Outliers in ANOVA)
Reference
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Hello Charles,
I am a long time user of your package, and i have found it extremely helpful. Thank you for that contribution!
I also find your online explanations, examples, etc. very well written and useful.
However I must take exception with this particular page, where you say that one of the assumptions for ANOVA is normality of each group. That is neither a necessary condition, nor a sufficient one. The needed assumption is normality of the residuals: see for example https://stats.stackexchange.com/questions/6350/anova-assumption-normality-normal-distribution-of-residuals. And that is logical, as the denominator of the F statistics is precisely the variance of the residuals (scaled by the df). For the F to be really a F statistics, that denominator needs to be Chi^2, hence the need for residuals to be normally distributed (or at least not too non-normal). You can also see at the relationship betwewen ANOVA and regression; for regression it is clearly the residuals (errors) which need to be normally distributed. I know that a wide range of sources, textbooks, and other authoritative opinions repeat this assumption (each group, or the DV for n-way ANOVA) need to be normally distributed, but that is sadly a perpetuation of a misinformation. You may want to review this page, so you can contribute to correcting this situation.
Hello Jacques,
I hadn’t realized that normality of residuals wasn’t equivalent to normality of the groups.
I have now created an example where the residuals are normal but one of the groups is not, and another example where the residuals are not normal but the groups are normal.
Thank you very much for your comment. I will make the appropriate corrections on the website and add some capabilities to the Real Statistic software to return the residuals from ANOVA. I will add this the next release of the software which should be available within the next week.
Charles
Hello Jacques,
Do you have a way to calculate the residuals for repeated measures ANOVA?
Charles
Dear Charles
I designed an experiment on driving posture preferences of people with different attributes for different car models (Sedan, SUVand MPV), with gender and height as between-group variables and car models as within-group variables; should this be done using repeated ANOVAs? But I also noticed that some researchers use ANOVA for repeated experiments, is there any theoretical support for this?
Thank you in advance,best wish!
It really depends on the details of the design.
Car model could be a within-subjects factor in the case where someone owns more than one car.
Charles
You are right, in my experimental design, a single user experienced multiple test car models. In this case, should I only use repeated measures ANOVA?And,could you recommend me some books or papers on rational choice of statistical methods?
With kindest regards!
Chengmou Li
You can use repeated measures ANOVA if the assumption for this approach are met.
There are a number of books addressing this topic, many are references on the webpages of the Real Statistics website. Often these books use the term Design of Experiments in their title.
Charles
Hi Charles,
Does Central Limit Theorem play some role in the robustness of F statistic to the violation of normality?
Can you explain it briefly?
Thanks in advance!
Hi, I believe that this is true. See
https://stats.stackexchange.com/questions/5680/can-i-trust-anova-results-for-a-non-normally-distributed-dv
Note that “robust” means that the test is usually valid even when the normality assumption fails, but it can be affected when the sample is quite different from normally distributed.
Charles
Very informative answer.
Thanks.
Dear Charles,
I’m currently applying ANOVA for a 2^7 factorial design. The normality assumption is however violated with p-value < 2.2e-16. Levene's test has a p-value of 0.05482. Can I still work with ANOVA?
Kind regards,
Elisabeth
Hello Elisabeth,
In general, you should be more concerned about Levene’s test than the normality assumption. From your results, Levene’s test is borderline, but the normality test is very poor. I would check to see whether an outlier is distorting the normality test. I would also consider using Kruskal-Wallis.
Charles
Hi, Charles!
I’m running a series (10) of 3 x 2 ANOVAs. I’ve got around 2000 P’s, however, group sizes are very unequal. Firstly, the assumption of normality (Shapiro-Wilk) was breached for all outcome variables at each level of both IV’s. Blanca, et al., (2017) indicates that ANOVA is robust in all instances of non-normality (homogeneity assumed) they tested (i.e., up to skewness = 2 and kurtosis = 6).
This would mean that 9/10 of my outcome variables should be fine. However, I’m not sure how to deal with the last one. The other issue is that this research specifies that homogeneity of variance is assumed, in my instance five variables violated this assumption if going off the mean-based test, and three violated it going off the median-based test (which may be better to interpret when data is not normal).
I’m a little lost at how I should proceed given my various violations (and unequal group sizes). Any help is much appreciated!
Hi Thomas,
1. ANOVA tends to be pretty robust to violations of normality, but not to violations of homogeneity of variances.
2. When you say that five variables violated the homogeneity of variances assumption, I assume that you mean that five of the 10 tests violated this assumption (since the test is not on individual variables).
3. If I understand correctly, you are conducting multiple two-factor ANOVAs. The following webpage may be useful in this case
https://www.real-statistics.com/two-way-anova/testing-two-factor-anova-assumptions/
4. With one-factor ANOVA, Welch’s ANOVA tends to be a good substitute when the homogeneity of variances assumption is not met. The situation is more difficult for two-factor ANOVA. Real Statistics offers two alternative tests; see
https://www.real-statistics.com/two-way-anova/scheirer-ray-hare-test/
https://www.real-statistics.com/two-way-anova/aligned-rank-transform-art-anova/
5. You mentioned that the research assumes homogeneity of variances. It is strange that such an important assumption is just assumed with no evidence. Perhaps someone has already done the research and found that this assumption was met. In that case, you have either found a counter-example or have made some sort of error.
Charles
Dears,
I’ve done research with one questionnaire while I was also observing some additional characteristics of participants. For example, I was observing educational degrees which had three categories (bachelor, master, doctoral). Now I want to compare those three categories for the questionnaire total score, but I have a big difference in the number of participants in each of the categories (bachelor N=54, master N=117, doctoral N=14). How can I know if that difference in the number of participants between categories is ok so I can do a further analysis?
Thank you in advance!
Nina
Hello Nima,
You can perform ANOVA even with group sample sizes that are quite different, however, you need to be aware of the following:
1. The power of the test will be reduced, i.e. lesser ability to detect small differences in effect size
2. The test is less robust to violations of the homogeneity of variances assumption. This might lead you to use Welsh’s ANOVA instead.
Charles
Thank you very much for answering fast!
I’ve applied Leven’s test and it showed significance >.05.
Does it mean that I can proceed with “regular” ANOVA?
I’ve never used that Welsh’s Anova.
Hello Nima,
Yes, in this case, you can proceed with “regular” ANOVA (assuming that the other assumptions are met).
Charles
Hello Sir, what will be the effect of violating all the assumption to the comparison wise and experimental wise error rate in post-hoc
Assuming that you are willing to accept a 5% type I error (the usual assumption), and you find that a test shows that p-value = .03, then you conclude that you have a significant result (one that exceeds the 5% = .05 threshold). If instead, the test shows that p-value = .10, then you conclude that you don’t have a significant result (i.e. your results are consistent with the null hypothesis).
Now suppose that a p-value of .03 is really a p-value of .10 (or that a significance level of 5% is really 15%), now you need to adjust your thinking about the test. This sort of situation potentially arises when a test assumption is not met, and so you may reach the wrong conclusion. Violating some assumptions is riskier than others (e.g. minor violations of normality is usually less of a problem than violations of homogeneity of variances).
Charles
Hello! We have 50 subjects and each of them has multiple measurements of a variable, X, in three different conditions. the three conditions are low, medium and high. We want to evaluate whether there are differences between the means of X of the three conditions. We assume that we can’t use ANOVA because our observations are not independent. Not only do we have observations of the same subject in the three conditions/groups, but we also have multiple observations of the same subject in each condition/group. Is there an alternative to ANOVA that we can use? I believe that we can’t use repeated measures ANOVA either, because we have multiple meaurements of each subject in each group (and not the same number in each group).
Manos,
If for each subject you had one measure for each of the 3 conditions, then you could use repeated measures ANOVA or one-factor MANOVA. As you have observed since you have multiple measurements for the same subject/group, you can’t use either of these approaches nor any of the designs described on the website.
I don’t know what analysis you can use, although I have the following suggested approach that might be appropriate:
1. For any subject/group for which you have a duplicate use the mean of all the duplicate entries. Then you can use repeated measures ANOVA or MANOVA
2. For any subject/group for which you have a duplicate randomly select one of the duplicate entries.
Charles
Thank you very much for the answer!
I think that the first approach that you suggest, may be suitable!
If my data follows more a sigmoid function over time (time series data) can I still apply ANOVA or what other test would you reccomend?
I need additional information before I would be able to address your question. Can you provide the following information?
1. Describe the scenario that you are looking at. Include the nature of the data.
2. What hypotheses are you trying to test?
Charles
Can I use ANOVA statistics on a small population data! If no what is the required sample size for using ANOVA?
Hello Mohammed,
Yes, you can use ANOVA on small samples. How small depends on a number of things. See the following webpage for details:
https://www.real-statistics.com/one-way-analysis-of-variance-anova/power-for-one-way-anova/
Charles
what is the minimum sample size for a two – way ANOVA
You can determine the sample size for each of the two main factors and the interaction by using the One-way Anova sample size tool described at https://www.real-statistics.com/one-way-analysis-of-variance-anova/power-for-one-way-anova/
You can also use the Real Statistics Sample Size and Power data analysis tool, as described at
https://www.real-statistics.com/hypothesis-testing/real-statistics-power-data-analysis-tool/
Alternatively, you can use G*Power as described at
https://www.researchgate.net/post/What-is-the-best-way-to-determine-the-necessary-sample-size-for-a-two-way-ANOVA-in-a-psychological-study#:~:text=A%202%2Dway%20ANOVA%20works,group%20at%20each%20time%20point.
Charles
Hello ! I have non-normal data that I would have liked to analyze using a 2-way repeated measure ANOVA (two groups with measurements at 2 time points). I tried transformation (sqrt, ln, log, box-cox), and data stay non-normal. I can’t find an appropriate non-parametric test! What do you suggest?
Also, my sample size is small, 15 per group. I’ve heard that if the homogeinity of data, I can still do my ANOVA. Is this true? Thank you !!
What to do depends on what hypothesis you want to test. E.g. you could perform a two-sample t-test using the differences between the measurements at the two time periods for each subject. This will test whether there is a significant difference between the two populations from which the samples are drawn based on the change in the measurements between the two time periods. If the set of these differences is normally distributed then the t-test could be the way to go. If not, you could use the Mann-Whitney non-parametric test.
Other tests are possible depending on how far from normality the appropriate data values are. The devil is in the details. You might also be about to use resampling even if the data is not normally distributed.
Charles
When data are missing, what happened to the ANOVA assumptions
This depends on what data are missing and what type of ANOVA you want to perform. E.g. if you have 3 groups each containing 10 elements and one of the groups is missing one of the elements, you can still perform one-way ANOVA and the results should still be valid provided the missing element is missing at random (e.g. the value was obtained but it is unreadable or the measurement was not obtained because the missing data was from a person who missed the bus and so a value for that person couldn’t be obtained, etc.).
Can you provide some additional information about the type of ANOVA you want to perform and how much of the data are missing and the nature of the missing data.
Charles
Can we say that the assumption of normality of the disturbance term is not essential for carrying out the ANOVA test?
If by disturbance term you mean the residuals, then normality is essential for correctly interpreting ANOVA. You can certainly perform the test even if the normality assumption doesn’t hold but your conclusions may be incorrect. Fortunately, ANOVA is pretty forgiving about this assumption not holding, but it the data is too far from normality, you can have problems.
Charles
I performed the ANOVA. My results showed a level of heterogeneity with unequal group sample sizes (36, 31, 25). What would you suggest in this case?
Thank you
Probably Welch’s ANOVA followed by Games-Howell.
Charles
Thank you
I performed a Shapiro Wilk test for my data and some of my groups did not meet the requirements for a normally distributed population. For example, I have 7 sample groups and only 4 of them had a normal distribution. Another example is where I had 12 sample groups and only 9 of them had a normal distribution. Is it required for ALL groups to be normally distributed to perform an ANOVA or is a simple majority of the groups being normally distributed sufficient?
Alexis,
All groups need to be normally distributed, but ANOVA is pretty robust to violations of normality, and so if the groups that are not normally distributed are not too far off from normality, you should be ok. This is a judgement call.
Charles
Dear Charles,
I have been searching whether the shapiro test needs to be applied for all the data together (e.g. 4 times n = 20; total = 80) or each single group. Most internet sources are not 100% clear about that. I think we need to apply it for every single group. Could you please explain why only that makes sense and maybe give me a link to a scientific source.
Thank you in advance!
Patrick
Dear Patrick,
Normality is required for each of the 4 groups. See
https://www.theanalysisfactor.com/checking-normality-anova-model/#:~:text=So%20you'll%20often%20see,the%20residuals%20are%20normally%20distributed.
Charles
Sir,
Is sample randomly slected in anova
I have a little not confusion
Samples are supposed to be randomly assigned in ANOVA.
Charles
What is………….>>>>>>>>….?
ANOVA 🙂
I’m happy,, this is good answers
How do we plot the value of anova on normal probability curve??
Usually, the value from ANOVA is a t statistic or F statistic and not a statistic for the normal distribution.
In any case, a z-score (the statistic for the normal distribution) is one point on the normal probability curve.
Charles