Assumptions for ANCOVA

Overview

The same assumptions as for ANOVA (normality, homogeneity of variance and random independent samples) are required for ANCOVA. In addition, ANCOVA requires the following additional assumptions:

  • For each level of the independent variable, there is a linear relationship between the dependent variable and the covariate
  • The lines expressing these linear relationships are all parallel (homogeneity of regression slopes)

This last assumption is equivalent to 

  • The covariate is independent of the treatment effects (i.e. there is no interaction between the covariant and the independent variable).

When this last assumption is not met, we can still perform an ANCOVA-like analysis as explained at ANCOVA when the homogeneity of slopes assumption is not met.

Example 1: Show that the assumptions hold for the data in Example 1 of Basic Concepts of ANCOVA.

Homogeneity of Variances

We start by creating a box plot of the reading scores for each of the four methods (using the data from Figure 1 of Basic Concepts of ANCOVA). See Figure 1.

Box plot ANCOVA

Figure 1 – Box plot for data in Example 1

Each plot looks relatively symmetric and the variances don’t appear to be wildly different. As we can see from the data in Figure 1 of Basic Concepts of ANCOVA, the variances for the reading scores vary from 44.8 to 164.8, which is likely to be an acceptable range to meet the homogeneity of variances assumption.

Graphical Testing for Equal Slopes

We now turn our attention to the ANCOVA-specific assumptions. We create a scatter diagram of the y data values against the x data values for each of the four methods. This is done by creating a scatter diagram for Method 1 in the usual way and then choosing Design > Data|Select Data and clicking on the Add button on the left side. Enter the name Method 2 and specify the range for the x and y values in the dialog box that appears. After repeating this procedure for Method 3 and Method 4 and adding linear trend lines for each method, the resulting chart is as in Figure 2.

Scatter diagram ANCOVA

Figure 2 – Checking whether regression lines are parallel

Although the four lines are not parallel, their slopes are quite similar, indicating that the homogeneity of slopes assumption is met.

Analytic Testing for Equal Slopes

A further indication of this is to test the complete regression model y, x, t, x*t against the full regression model y, x, t. If there is no significant difference between the models then the interaction terms are not significant, implying that the homogeneity of regression slopes assumption is met. We conduct the same type of test in Testing the Significance of Extra Variables on the Regression Model.

First, we use Excel’s regression data analysis tool to create the complete model (see Figure 3) using the range B4:H39 from Figure 1 of Regression Approach to ANCOVA when prompted for the Input X range.

Complete regression model ANCOVA

Figure 3 – Complete model (y, x, t, x*t) for data in Example 1

Now we test (see Figure 4) whether there is a significant difference between the complete and full models (as described in Figure 5 of Regression Approach to ANCOVA and Figure 3 above).

Homogeneity regression lines slopes

Figure 4 – Testing homogeneity of regression line slopes

Row 6 of Figure 4 computes the difference between the R-Square values of the complete and full models. Row 7 computes the difference between the residual degrees of freedom of the two models. The F statistic (cell AB8) is then defined via the formula =AB6*Z7/(AB7*(1-Z6)). Since the p-value for this statistic is larger than .05, we conclude there is no significant difference between the two models, and so accept the homogeneity of regression slopes.

Alternatively, we can get the same result by using the Real Statistics function

RSquareTest(B4:H39, B4:E39, A4:A39) = 0.4615

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Schmuller, J. (2009) Statistical analysis with Excel for dummies. Wiley
https://www.wiley.com/en-us/Statistical+Analysis+with+Excel+For+Dummies%2C+3rd+Edition-p-9781118464311

84 thoughts on “Assumptions for ANCOVA”

  1. Hi Charles,
    Can I use this approach to test the intercepts too?
    That is, here you compare R^2 of a complete regression (y, x, t, x*t) with the one of a reduced regression (y, x, t). Can I do the same for complete vs (y, x, x*t) and use the resulting F for the difference between the intercepts (and just that)? Or “to assess homogeneity of the intercepts” if you prefer, like a one-way AN(c)OVA between them.
    Thanks.

    Reply
  2. Hi Charles

    For my masters thesis I am running my statistics in SPSS but I have a question regarding which statistical test to use. I’m doing motion analysis so my dependent variable is ‘peak hip flexion angle’ during a certain movement task. I divide subjects in two groups based on the presence or absence of an anatomical hip deformitiy seen on radiographic investigations. Therfore my independent variable = group. Group 1 has the deformity. Group 2 does not have the deformity. I want to investigate whether subjects with a hip deformity (group 1) have different peak hip flexion angles during a certaian motor task when compared with subjects that don’t have the deformity (group 2). However, I think that age might be considered as a covariate because it has been shown that age can influence the peak hip flexion angle during certain motor tasks. (older age = more muscle force = higher hip flexion angles)

    For this reason, I wanted to run an ANCOVA with dependent variable = peak hip flexion angle, independent variable = group, and covariate = age.
    The problem is that my covariate is also related with my independent variable. This is because it has been shown that older subjects have a lot more chance to have the hip deformity. In fact, the subjects in group 1 (having the deformity) are significantly older when compared with subjects of group 2 (not having the deformity)

    Therefore I wanted to ask if it is possible to consider age as a covariate here?

    Thanks in advance!

    Reply
    • Lenie,
      ANCOVA can be performed using linear regression and so not all the assumptions described on this webpage need to be satisfied in order to perform an analysis of the type that you have in mind.
      I plan to add an example of this type to the website in the next couple of days.
      Charles

      Reply
  3. Hi Charles,
    Could I please check, are we able to include categorical covariates in an ANCOVA (e.g. choice: X or Y)? I am looking to test the effect of my intervention (2 groups, randomly assigned) on paranoia, with covariates of age (continuous), choice (X or Y) and forgiveness (continuous variable). Secondly, if i have multiple covariates as above, how do i go about testing for Homogeneity of Regression Slopes ? Many thanks, Fareeha

    Reply
    • Hi Fareeha,
      Since age is a continuous variable, why are you asking about categorical covariates? I am probably not understanding things properly.
      You can perform ANCOVA with multiple covariates and even when the homogeneity of slopes assumption is not met. This is done using linear regression. I plan to add an example to the website about how to do this in the next couple of days.
      Charles

      Reply
  4. I have a data set of in which Dependent variable is post achievement scores and covariate is pre achievement scores and two independent variables. the assumption of ANCOVA i.e. homogeneity of regression slopes are violated then let me suggest can we apply ANCOVA are not , if not then which test should be used.

    Reply
    • If the assumptions for ANCOVA are violated, one potential approach is to use two-factor ANOVA where the dependent variable is the difference between the pre- and post-achievement scores.
      Charles

      Reply
  5. Hi Charles,

    I understand that the covariate should be correlated with the outcome variable. 
    My question is, is there a specific criteria of how strong the correlation between the covariate and the outcome variable should be? For instance, let’s say I’m considering “empathy” as a covariate. Does empathy have to be moderately or highly correlated to the outcome variable to be included as a covariate? Or is it okay to include empathy even when its correlation with the outcome variable is weak (<.3), as long as it is significant?

    Also, if there are any specific criteria regarding this, could you please share references so I can cite them in my paper?

    Thank you!

    Reply
  6. Question: I can easily comprehend a test of linearity between DV and CV (and test for homogeneity of slopes) when a single factor / IV is present. But for a two-way ANCOVA (two categorical IVs, each with three levels), what constitutes a “level” of an IV? My thought was, I need to create a new variable, call it “cell” or whatever, then test every combination of the two, 3-level IVs i.e. 9 “cells”. So am I testing for linear relationship between DV and CV by cell, OR…do I test once for the three levels of one IV and then another test for the three levels of the remaining IV? So I’m either creating a scatterplot with 9 groups OR…two scatterplots each with three groups. Which? I hope my question is worded correctly. I’m thinking the former.

    Reply
  7. Dear Charles,
    I am doing my bachelor’s thesis on sleep and memory. I am calculating a repeated measures ANCOVA in SPSS with factor sleep condition (4h vs. 8h of sleep) and memory test time of measurement (3 test times). I want to include order as a covariate, as counterbalancing was not ensured. Now, with a categorical covariate I am unsure how to test for the assumptions of ancova- I can not do a scatterplot for homogenity of regression slopes. Can I just look at the interaction term of order*sleep or order*test time and see whether it is significant? Neither do I know how to test for linearity. Also, how do I test for independence of the covariate of the group effect?
    Moreover, I am wondering whether it is possible to do planned contrast or/ and post hoc tests (to include the covariate and compare effects of interaction terms) with a two factorial repated measures ANCOVA?

    Thanks so much in advance.
    Hannah

    Reply
  8. Hi

    If this is my hypothesis is Age is a relevant co-variate of the effect of drinking on emotional regulation’ would an ANCOVA be appropriate?

    I can’t decide between ANCOVA and Moderation

    Reply
  9. Hello. Excite me before. I want to ask. I have a data. 4 data. Those are pre and post test for experiment and control class. And now, I’m confuse to test Ancova for homogeneity of regression (slope), linier relationship between covariate and dependent variable, how to count F table and read the result of Ancova. Thanks, I really need your hand !

    Reply
  10. Dear professor Zaiontz, I want to use ANCOVA to test whether the phosphatemia of a group of hemodialysis patients differs before and after treatment with phosphate chelating agents. Phosphatemia correlates with daily protein intake. I would like to know if, in ANCOVA, the two variables under treatment (pre and post treatment phosphatemia) must be correlated with each other, as well as with the covariates (pre and post treatment protein intake).
    Thank you so much, Roberto.

    Reply
  11. Hi Charles,
    I have following data,
    One dependent variable (CTmax, which is continuous), two independent variables (Treatment (ramping rate) and Nest) and one explanatory variable that I want to use as a covariate.
    All assumptions are met as you mentioned for an ANCOVA. I want to test whether weight of the insect is affecting the relationship bewtween CTmax and treatment. I used R to run the test.
    ancova<-aov(CTmax~Treatment+Weight+Nest)
    I just want to know the way the formula is written is correct.
    And I also want to know whether a GLM can be used for the same data set?
    I have, CTmax=81 values
    Treatment=three groups (T1, T2 and T3)
    Weight = 81 values
    Nest = 07 nests

    Reply
  12. Sir
    can i run ANCOVA for dependent groups?
    study design includes
    Two air quality (healthy and unhealthy) and 2 environments (indoor vs outdoor)
    subjects completed an exercise protocol at each condition and we had pre and post test sampling in each conditions.

    Reply
    • Hello Ali,
      I don’t completely understand your comment. In general, if you have multiple dependent variables, then you would use MANCOVA instead of ANCOVA (just as you would use MANOVA instead of ANOVA).
      Charles

      Reply
      • Thank you for your answer
        the question is about group, i have only one group of subjects tested in 2 different environments (indoor vs outdoor) and 2 air quality (healthy vs unhealthy) with 1 or 2 weeks of washout between conditions, and i had pre and posttest at each condition.
        now i want to compare the conditions but as there are considerable differences in pre tests amounts i cant use repeated measures!
        so i am thinking of using Ancova to control the effect of pre test but as i have one group of subjects (in fact 4 dependent groups) i am not sure about using Ancova! as on of the assumptions is the independence of groups (if i am not wrong).
        At the other hand i ma not aware of any other statistical test (like Ancova) for dependent groups?

        Reply
        • Hello Ali,
          This depends on what specific null hypotheses you want to test. For example if you want to see whether there is a significant difference between pre and post, you can run 4 separate paired t tests, one for each combination of indoor/outdoor and healthy/unhealthy. Alternatively, you can simply take the averages for each of these 4 measurements for each subject (before and after) and perform one paired t test.
          You can also perform a paired Hotelling’s T-square test where you have 4 dependent variables, namely the 4 combinations of indoor/outdoor and healthy/unhealthy.
          It all depends on your objective.
          Charles

          Reply
  13. Charles, could you please explain briefly the reason behind the third assumption, as a way of understanding what I would do in the following situation? Let’s say I subject two groups (assigned at random) to a blood pressure medication or a placebo, and I am interested in blood pressure readings at the end of the treatment period. I want to control for possible effects of body mass on blood pressure. It turns out that individuals that took the medication are higher in body mass because they gained a little more weight than the control group. That is, the covariate turns out to be affected by the treatment. Isn’t it possible to still interpret the effect of medication on blood pressure controlling for body weight by running an ANCOVA? Or is there an alternate approach? Many thanks! -R

    Reply
    • Hello Rafaela,
      Could you use a 2 factor Anova with Treatment and Body Mass as the factors? The independent variable could be the change in blood pressure or the percentage change in blood pressure. You could also do a Repeated Measures (before and after) with the above two factors, although this would be a more complicated analysis.
      Charles

      Reply
    • This is why it is an assumption of ANCOVA that covariates are measured before the experimental manipulation.

      Reply
  14. Hi, Charles,

    first of all, my apologies for the delay answering you back after your kind reply. I forgot that I had posed this question, I am so sorry!

    Let’s suppose that I want to check normality and equality of variances for the following model for ANCOVA: covariate: body length, response variable; weight, factor: sex. That is, I’d have 2 groups (males and females). So, taking your answer into account… should I run the two models (one for each sex), and check normality and equality of variances for the residuals of each one of them?

    Best regards,
    Alicia

    Reply
  15. Hi Charles,

    I want to run an ANCOVA using R so as to evaluate the effect of several categorical factors (which are sex, age, area, etc., with several levels each, such as male/female, adult/subadult, a/b/c/d, etc.) on the relationship between length (continuous covariate) and weight data (response variable), that is, body condition. So, the question I’m trying to answer is: Are there any differences among the different levels of sex/age/etc in their body condition?

    Prior to that, I have to check both normality and homogeneity of variances assumptions. I have recently known that it is the residuals of the linear model and not the variables the ones that must fulfill normality.
    But my question is, how I am supposed to check it? I mean, should I run the Shapiro-Wilk test for the residuals of the logweight~loglength regression for each level of each factor? Putting sex as an example, should I run it for males and females separatedly? As there are some factors in my dataset which have many levels, I wonder whether this is correct. And in that case, if there is a quicker (although statistically correct) way to do so.

    Thank you in advance!

    Reply
    • Alicia,
      Suppose that you have 4 groups, then the residuals for each group are the differences between the data values for that group minus the group mean. Since the group mean is a constant, normality of the residuals is equivalent to normality of the data values. Thus, you only need to check normality of each group sample (based on the shaky assumption that the sample reflects the population, but it is the best you can do).
      Charles

      Reply
  16. Hi Charles,

    You did not clearly address how to conduct the assumption of linearity or how you can calculate it. Could you maybe illustrate this for me?

    Reply
    • Ryanne,
      The usual way to test linearity is to create scatter plot and whether the point are reasonably aligned (i.e . are reasonably close to some straight line).
      In the case of ANCOVA, you need to do this for each independent variable. For the example on the referenced webpage, this means that you create four scatter plots between Reading Score and Income, for each of the four teaching methods.
      Charles

      Reply
  17. I am interested in looking at grade level differences between students on end of year math scores, after controlling for the pretest. I found that my data violates homogeneity of regression slopes. How should I proceed? Is there an alternative test? Thanks!

    Reply
  18. Hey Charles,

    i´m so struggling with one analysis of my bachelor thesis, maybe you can help me.
    I´m having a mediator hypothesis, which I can´t compute correctly with SPSS according to my prof. Now he either wants me to perform a ANCOVA or a partial correlation.

    I have three variables to be included:
    The independent variable (grouping variable) has two stages (forms of strategies of perspective taking)
    The dependent variable is metric (number of correct predictions in a partner game)
    and the influencing variable is based on a likert-scale (similarity perception to partner)

    Can I even do an ANCOVA? Or a partial correlation? I already tried and SPSS is giving me some results, but i don’t know if it´s allowed to do this analysis. Been hanging there for such a long time, I´m so confused now.

    Hope it´s clear for you what I mean,
    Thanks in advance!

    Reply
    • Franzi,
      I don’t have enough information to give you a precise answer. It seems like either approach may be possible, but more importantly what hypothesis are you trying to test? First you need to determine what you want to test and then you can determine which test is appropriate.
      Charles

      Reply
      • Of course, i´m sorry for the lack of information!

        The hypothesis is:
        Participants in the condition Imagine Self, who see themselves as similar to their partner show a significantly higher prediction accuracy, than participants who don’t. In contrast, the perception of similarity doesn’t have an influence in the condition Imagine Other.

        Thanks for helping me!
        Franzi

        Reply
        • Franzi,
          Thanks for providing a clear statement of your objective. This is very important, and often people jump straight to testing before they are clear as to what they should be testing for.
          Unfortunately, I don’t have enough details to answer your original question. If you can, I will ask your professor to explain better why he is recommending the two approaches that he has suggested to you.
          Charles

          Reply
  19. Dear Charles,
    My question may be a little bit “far” from the topic but I hope you can nonethless help me. It deals the assumptions that need to be satisfied when runing an ANCOVA with a categorical covariate.
    When the covariate is continuous, as you say, three assumptions need to be met : (1) For each independent variable, the relationship between the dependent variable (y) and the covariate (x) is linear, (2) The lines expressing these linear relationships are all parallel (homogeneity of regression slopes), (3) The covariate is independent of the treatment effects (i.e. the covariant and independent variables are independent).
    Now, when the covariate is categorical, are there assumptions to be met? I would say that the third one would be that there is no interaction between the IV and the covariate, but is there any equivalent for the first and second one?
    Thanks a lot for your help!

    Reply
  20. Hi Charles,

    I have a similar issue as someone posted above, but i did not really understand the answer.

    I have data from two groups that I would like to compare, while taking into account a covariate. none of the data is normally distributed (group A, B, or the covariates). I’m not sure which analysis to use. is there a mann-witney U test with a covariate? or could this be done with regression (not sure how to set that up though).

    Reply
  21. i have pretest as covariat, but if pretest in levene’s test not significant, means that there was no significant different between control and experiment group in their pretest, so we should use ANCOVA or ANOVA as it has no different on pretest..

    Reply
  22. Hi Charles,

    I’m doing an ANCOVA for the first time for my dissertation and I’m a little bit confused.

    I’m looking at stigma attitudes towards mental health in children. I did pre and post test surveys measuring stigma attitudes and emotional intelligence, with an intervention workshop challenging stigma towards mental health in between the two times. I also had a control group who did the same surveys but no intervention. I analyzed the scores/data using repeated measures ANOVA and found a significant main effect of time on stigma, as well as interaction effect of group x time. Now I want to test whether emotional intelligence has an effect on stigma, and have been told by my research supervisor to use ANCOVA to do this. I’ve watched plenty of videos on youtube explaining it but I don’t know if I’m doing it right as I don’t understand the output I’m getting from it… If you could give me a hand I will be forever grateful!

    Reply
  23. I am looking to use ANCOVA to look at group differences in a pre- post- test design using pretest as a covariate. one group is measured during 2 sessions, before and after a treatment; the other group is a control and is measured on two sessions with no treatment. DV is number of minutes spent talking during the session. Problem is the pretest scores (the covariate) as well as the post test scores (number of minutes) are very non-normal in their distribution, with lots of measurements at zero minutes and the rest showing some normalcy. Very skewed distributions with most scores at zero. This seems to violate a major assumption of the ANCOVA. Ideas? I have another categorical covariate with 3 levels that does not account for any variance in pre or post measurements. Thanks.

    Reply
    • One problem (i think)…that (i think) disinclines me from a repeated measures ANOVA is that i have (by chance) a significant difference between groups on the pretest.

      Reply
      • Andrew,
        If you are only comparing two groups, you might be able to use a nonparametric test (e.g. Wilcoxon signed-ranks test). You might be able to use Friedman’s test with more than two groups.
        Charles

        Reply
      • have not done any transformation. always felt they were voodoo. i have lots of zeros (maybe 40%) in my data so cannot do a log transform….square root doesn’t do enough… I might just do a repeated measures ANOVA but I still think my data violate assumptions. is there a transform you recommend?

        Reply
        • Andrew,
          I can’t say for sure that a transformation is the way to go, but it could be a solution. Without more information, I couldn’t tell you which transformation is best. Even if there are a lot of zeros, a log transformation can be used. E.g say -5 is the smallest sample value then you could a transformation of form LN(x+6).
          Charles

          Reply
          • Thank you so much Charles. I’ve transformed the data and performed both ANCOVA and repeated measures ANOVA with little difference between the two. You have been very helpful. Know that your assistance is appreciated!

            ~Andy

  24. Hi Charles,

    I have some confusion with my data. I am not sure which statistical analysis to apply.
    We conducted an in situ experiment on mosquito. We studied the effect of 2 insecticide and 2 different concentration on the different stages of mosquito. The study was carried out for 30 days. Sampling was conducted every 2 days for 30 days. I would like to know the efficiency of the insecticide, which insecticide and concentration was effective in controlling the mosquito and in how many days. My question is, whether ANCOVA is the appropriate analysis for my data or repeated measures of ANOVA?

    Reply
    • Hi Sanitha,
      It looks like you have several factors: insecticide type (2 levels), concentration (2 levels) and time (16 levels). Thus you have two fixed factors and a repeated measures factor (time). This sounds like a repeated measures ANOVA.
      If you use ANCOVA which is the covariate?
      You might have another factor, namely mosquito stage, although this may be subsumed in the time factor.
      Charles

      Reply
  25. use of ANCOVA to test the effectiveness of some intervention as a change in a variable (DV) in simple pre post study design with only one group (witot any control group) controlling for the change in some other covariate (s) due to that intervention??? possible?

    Reply
      • Suppose there is one study group in which exercise intervention was given to see the change in VO2max. Due to this intervention, there was increase in lean body mass also. Increase in lean body mass may cause increase in VO2max also. So to see whether (a) the exercise intervention is effective in increasing VO2max independent of increase in lean body mass, and (b)if possible independent of the initial values of VO2max, which test is to be used? any suggestion?

        Reply
        • So, here i wan to control 2 things : (a) change in some other variables due to the intervention (b) the initial/pre test values of that variable which we are testing if the intervention is effective in changing or not. How can ANCOVA be useful in this case

          Reply
        • Barun,
          If I am interpreting your question correctly, ANCOVA could be used in case (a) provided the test assumptions are met. I am not sure what (b) means in the context of ANCOVA.
          Charles

          Reply
  26. Any extra assumptions for ANCOVA using two covariates which are linearly correlated to each other, and to the dependent variable, if at all usable??

    Reply
  27. homogeneity of variance is violated with unequal sample size in ANCOVA?one controversial approach is to first equalize the sample size through random selection, then set p-value<.001 to reduce alpha error, & then go ahead with ANCOVA. I dnt think this approach is good or defensible. but some people use it.can i get any reference about this approach

    Reply
  28. Hi Charles,
    back to Rina’s questions: if homogeneity of slopes (regression)? what other test can be used to correct for covariates? nonparametric (quade) test?

    Thanks

    Reply
  29. Dear Charles

    Thank you for all your efforts. I have a question on the 3rd point of the ANCOVA assumptions “The covariate is independent of the treatment effects”.
    I know that a common use for the ANCOVA is to study pre-test post-test results in different groups, by assigning the pre-test score as covariate, post-test as dependent variable, and treatment group as independent variable. The reason behind using ANCOVA here is to remove the influence of pre-test scores on the post-test results.

    But how can we use ANCOVA in this setting if we already know that treatment groups have different pre-test scores, i.e. there’s a correlation between pre-test and groups (not independent). Am I missing something here ?

    Reply
    • Hamid,
      If you already know that there is a correlation between the treatments and the pre-test results, you couldn’t use this approach, but generally there is no such correlation since you assign the subjects to the treatment groups randomly.
      Charles

      Reply
      • Thanks for the prompt reply Charles. I might still be able to use it since the correlation was only found after finishing the experiments and plotting the results (incidental finding).

        Reply

Leave a Comment