ANOVA

Analysis of Variance (ANOVA) is an extension of the two-sample hypothesis testing for comparing means to more than two samples. The following topics are described in greater detail.

Topics

Reference

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

40 thoughts on “ANOVA”

  1. Dear Dr.Zaiontz,
    I have 2 sets of data that will be compared to each other in charts or bars or graphs; the problem is one set has large numbers (say 10^9), and the other set has smaller numbers (like 1-10000).
    If I convert data to logarithms before any analysis, the first set would be alright, but the second set will be much smaller numbers and also there are a couple of (1) numerics in this set which will be zero when calculating its logarithm.

    I would appreciate any help regarding this issue.

    With gratitude,
    Nafis

    Reply
    • Dear Nafis,
      Since the data are of different sizes, it is not clear what your objective is. Are you trying to determine whether the distribution of values in the two datasets are of the same shape?
      Charles

      Reply
      • Dear Dr.Zaionts,
        I’m trying to analyze each set separately using Anova and post hoc tests and actually the results of both two sets are going to be demonstrated on 2 graphs(lines) in one figure.
        But since there are large numbers in the first set, I probably should demonstrate the data in their logarithms on my graph; my concern is that there are couple of {1} numbers in the replications in the second set that will be zero in logarithm!
        What do you recommend for those 0 values? Can I just ignore zeros and analyze logarithmic data?

        Thanks in advance,
        Nafis

        Reply
        • Nafis,
          Since you say that “one set has large numbers (say 10^9), and the other set has smaller numbers (like 1-10000)”, you could just divide all the values in the first set by 100,000 (i.e. 10^9 / 10^4).
          If you use log scale, as long as none of the data values are zero, you can simply use the log of the value since it won’t be that negative, e.g. LN(0.0000000001) = -24.0259.
          Alternatively, you could add some value, say 1, to all the data points and so, provided all of your original data elements are positive, you won’t need to take log(0).
          Charles

          Reply
  2. Sir I’m very beginner to data analysis in real stat …can u help me out by saying what is rows of coefficient and interaction of coefficient in Factorial two anova dialogue box??

    Reply
  3. What does it mean in a general linear model of anova in minitab, “categorical variable with more than 1 distinct values required” in result?

    Reply
    • Hello,
      I don’t use minitab, but it probably means that every value in the sample for a categorical variable can’t be the same. This is true of any variable, not just categorical variables. If all the values for a variable are the same, you should just drop that value from the model.
      Charles

      Reply
  4. Respected sir, my study is based on the comparison of pre and post-privatization of a telecommunication firm. I take the data of five years pre and five years post-privatization so can I use ANOVA for my thesis? please guide sir.
    thank you

    Reply
  5. Dear Sir,

    I have been using this tool and it is very helpful and convenient for use. Thank you for creating the tool and this forum.

    I work on soil, in an experiment I applied one soil modifier at 3 application rates, alone and in combination with chemical fertiliser, giving 8 treatments including control, each replicated thrice. Subsequently I collected plant tissue samples at 2 – 3 different occasions, from each plot, each time. I analysed plant tissue nutrient content – macro and micronutrients.
    Now I can visually see that,
    1) Results are not statistically significant, however in case of 2 nutrients, the level in plant tissue was generally higher in treatments that included the soil modifier that I had used. So out of 8, 6 treatments had this soil modifier and 2 didn’t. In the 6 treatments, in general there is elevated content of 2 nutrients, compared to other 2.
    2) I can visually see that the Plant tissue concentration of 2 nutrients, correlate with yield. For treatments in which high yield was obtained, also had high nutrient content, in the two sampling points.

    Query – Is there a way to analysis the data to statistically draw conclusions from the data?
    Thanking you,

    Reply
    • Hello Rohan,
      I don’t have enough information to say for sure, but you ran some statistical test to determine that some “Results are not statistically significant”. What test did you use? Usually this means that none of the follow-up tests will be statistically significant either.
      Charles

      Reply
      • The design of experiment was Randomised block design. I used RBD ANOVA – tool from real stats.
        Also if we have to provide citation for this tool, do u prefer any particular format of complete citation?
        Thanking you,
        Rohan

        Reply
  6. Hello Sir,
    I am studying the mass yield of char produced from biomass. Key factors affecting the mass yield are temperature and reaction time of the process. There are 3 temperatures and 3 residence time inherent with each other. For example, the mass yield at 200 C and 30 minutes is 45%. So I have total of 9 mass yields each corresponding to a temperature and residence time value. I want to perform ANOVA analysis to determine which one from the temperature and residence time has more significant effect on mass yield. Can you please guide me which of the ANOVA analysis method, I should follow as I am totally confused about what to use.

    I will really appreciate if you can help me out here as I really need it for my thesis writing.

    Thank you

    Reply
    • Hello Dhara,
      If I understand correctly, you have two factors, Temperature and Reaction Time. I also understand that you have 9 samples, one for each combination of the 3 levels for Temperature and 3 levels for Reaction Time. In this case, generally you would use a Two Factor ANOVA without Replication. This is covered on the website.
      Charles

      Reply
  7. Dear Charles, thanks again for a new version of your wonderful tool.

    When using Tukey (or GH) option in anova (one or two factor) or follow up, I got errors on some cells, specifically on mean-crit, lower and upper limits, q critic. Other cells are correct. Maybe a bug on new functions?

    Reply
    • Hello Jorge,
      Sorry to hear this. I don’t believe that I have made any changes to this capability in the latest release.
      In any case, if you send me an Excel file with your data and results, I will try to understand what has gone wrong.
      Charles

      Reply
  8. Hello sir,
    How if my variable research consists of 1 independent and two dependent variables? in case, in Creswell I have read that the distribution of research suppose to non-parametric test with using Rank Spearman Correlation. When I discuss this to other lectures, he advises me to use one way ANOVA. While my dependent variable consist of “vocabulary” and “enthusiasm”. My question is “can rank describe both of the variable into 1 test?” This is the confusing I think a lot.

    I am really looking forward to hearing from you.

    Reply
    • Sorry Karina, but I would need more information about the hypothesis that you want to test and the nature of your data before I could answer your question. I don’t see how you can use ANOVA if you have two dependent variables; MANOVA is used instead, although I don’t know whether this appropriate in your case.
      Charles

      Reply
    • Joseph,
      You shouldn’t get a negative value. If you send me an Excel file with your data and the negative result for SS of residuals, I will try to figure out what is going wrong.
      Charles

      Reply
  9. Hi Charles
    Thank you so much for this software
    I am having a challenge. When I try run the one way ANOVA it says alpha must be a value between 0 and 0.5 but my alpha is at default

    Reply
      • Hello Charles,

        I receive the same error message. And yes, you are correct .05 is a value between 0 and 0.5 so it really shouldn’t be giving this meassage.. : /

        Thank you for making this wonderful resource,

        Steven

        Reply
        • Hello Steven,
          Glad that you like the Real Statistics resource.
          You might receive this error message when your system uses a comma instead of a period as the decimal symbol. You have the following choices in this case:
          1. Change the decimal symbol from comma to period. You need to do this in Windows and in Excel
          2. Click on the Config button on Real Statistics main dialog box (the one that comes up when you press Ctrl-m) and choose the Use Percentage option. Now instead of entering .05 (or 0,05) as the value for Alpha you enter 5 (meaning 5%).
          3. Enter 0 for the value of Alpha. In the output, change the cell containing Alpha from 0 to whatever value you want (using your system’s decimal symbol).
          Charles

          Reply
    • Paul,
      If you mean the p-value for ANOVA, just use Real Statistics’ ANOVA data analysis tool. You can download the software for free and then follow the instructions on the website.
      Charles

      Reply
  10. Hi Charles!

    I have an RCBD experiment testing 7 treatments with 3 replications each. I am trying to find out which treatment generates the highest yield. Is two-way ANOVA an appropriate test for this? If so, then can I use Tukey’s HSD test after it when significant differences are detected?

    Reply
  11. Hi Charles,

    Can you explain to me why when we test shapiro-wilks in excel using your calculations versus shapiro-wilks in SAS we get different results?

    Thanks!
    Kim

    Reply
    • Kim,
      I don’t know how SAS calculates Shapiro-Wilk. What was the p-value you got from SAS and what was it in Excel? How big is your sample?
      Charles

      Reply
  12. Hi Charles, thanks a lot for your website. I twice arrived at your website over a period of 2 years. Thought I might try asking you this question I’ve long had.

    What’s the difference between ANOVA and regression? I get the impression regression analyses variance, and thereby reaches the line of regression. So isn’t that “Use ANOVA to regress”? Thanks. So isn’t ANOVA and regression really just the same thing, start with ANOVA, end with line of regression.

    Would appreciate some clarification please. Thanks.

    Reply

Leave a Comment