Analysis of Variance (ANOVA) is an extension of the two-sample hypothesis testing for comparing means to more than two samples. The following topics are described in greater detail.
Topics
- One-way ANOVA
- Factorial ANOVA
- ANOVA with Random Factors and Nested Models
- Design of Experiments
- ANOVA with Repeated Measures
- Analysis of Covariance (ANCOVA)
Reference
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Dear Dr.Zaiontz,
I have 2 sets of data that will be compared to each other in charts or bars or graphs; the problem is one set has large numbers (say 10^9), and the other set has smaller numbers (like 1-10000).
If I convert data to logarithms before any analysis, the first set would be alright, but the second set will be much smaller numbers and also there are a couple of (1) numerics in this set which will be zero when calculating its logarithm.
I would appreciate any help regarding this issue.
With gratitude,
Nafis
Dear Nafis,
Since the data are of different sizes, it is not clear what your objective is. Are you trying to determine whether the distribution of values in the two datasets are of the same shape?
Charles
Dear Dr.Zaionts,
I’m trying to analyze each set separately using Anova and post hoc tests and actually the results of both two sets are going to be demonstrated on 2 graphs(lines) in one figure.
But since there are large numbers in the first set, I probably should demonstrate the data in their logarithms on my graph; my concern is that there are couple of {1} numbers in the replications in the second set that will be zero in logarithm!
What do you recommend for those 0 values? Can I just ignore zeros and analyze logarithmic data?
Thanks in advance,
Nafis
Nafis,
Since you say that “one set has large numbers (say 10^9), and the other set has smaller numbers (like 1-10000)”, you could just divide all the values in the first set by 100,000 (i.e. 10^9 / 10^4).
If you use log scale, as long as none of the data values are zero, you can simply use the log of the value since it won’t be that negative, e.g. LN(0.0000000001) = -24.0259.
Alternatively, you could add some value, say 1, to all the data points and so, provided all of your original data elements are positive, you won’t need to take log(0).
Charles
Sir I’m very beginner to data analysis in real stat …can u help me out by saying what is rows of coefficient and interaction of coefficient in Factorial two anova dialogue box??
Rija,
I don’t see these two items on the dialog box. See the following webpage
See https://real-statistics.com/two-way-anova/real-statistics-support-for-two-factor-anova/
Please explain what you are referring to?
Charles
What does it mean in a general linear model of anova in minitab, “categorical variable with more than 1 distinct values required” in result?
Hello,
I don’t use minitab, but it probably means that every value in the sample for a categorical variable can’t be the same. This is true of any variable, not just categorical variables. If all the values for a variable are the same, you should just drop that value from the model.
Charles
Respected sir, my study is based on the comparison of pre and post-privatization of a telecommunication firm. I take the data of five years pre and five years post-privatization so can I use ANOVA for my thesis? please guide sir.
thank you
What to do depends on what hypotheses you wish to test. In any case, please see the following webpages for how to proceed
One-way Analysis of Variance
Charles
Dear Sir,
I have been using this tool and it is very helpful and convenient for use. Thank you for creating the tool and this forum.
I work on soil, in an experiment I applied one soil modifier at 3 application rates, alone and in combination with chemical fertiliser, giving 8 treatments including control, each replicated thrice. Subsequently I collected plant tissue samples at 2 – 3 different occasions, from each plot, each time. I analysed plant tissue nutrient content – macro and micronutrients.
Now I can visually see that,
1) Results are not statistically significant, however in case of 2 nutrients, the level in plant tissue was generally higher in treatments that included the soil modifier that I had used. So out of 8, 6 treatments had this soil modifier and 2 didn’t. In the 6 treatments, in general there is elevated content of 2 nutrients, compared to other 2.
2) I can visually see that the Plant tissue concentration of 2 nutrients, correlate with yield. For treatments in which high yield was obtained, also had high nutrient content, in the two sampling points.
Query – Is there a way to analysis the data to statistically draw conclusions from the data?
Thanking you,
Hello Rohan,
I don’t have enough information to say for sure, but you ran some statistical test to determine that some “Results are not statistically significant”. What test did you use? Usually this means that none of the follow-up tests will be statistically significant either.
Charles
The design of experiment was Randomised block design. I used RBD ANOVA – tool from real stats.
Also if we have to provide citation for this tool, do u prefer any particular format of complete citation?
Thanking you,
Rohan
Hi Rohan,
I guess this is the same situation that you sent me an email about. I just sent you a response.
The recommended form for a citation is described at: Citation
Charles
Hello Sir,
I am studying the mass yield of char produced from biomass. Key factors affecting the mass yield are temperature and reaction time of the process. There are 3 temperatures and 3 residence time inherent with each other. For example, the mass yield at 200 C and 30 minutes is 45%. So I have total of 9 mass yields each corresponding to a temperature and residence time value. I want to perform ANOVA analysis to determine which one from the temperature and residence time has more significant effect on mass yield. Can you please guide me which of the ANOVA analysis method, I should follow as I am totally confused about what to use.
I will really appreciate if you can help me out here as I really need it for my thesis writing.
Thank you
Hello Dhara,
If I understand correctly, you have two factors, Temperature and Reaction Time. I also understand that you have 9 samples, one for each combination of the 3 levels for Temperature and 3 levels for Reaction Time. In this case, generally you would use a Two Factor ANOVA without Replication. This is covered on the website.
Charles
Me Again,
I am using version 6.2 on win 10, 64 bits and Excel 365
Dear Charles, thanks again for a new version of your wonderful tool.
When using Tukey (or GH) option in anova (one or two factor) or follow up, I got errors on some cells, specifically on mean-crit, lower and upper limits, q critic. Other cells are correct. Maybe a bug on new functions?
Hello Jorge,
Sorry to hear this. I don’t believe that I have made any changes to this capability in the latest release.
In any case, if you send me an Excel file with your data and results, I will try to understand what has gone wrong.
Charles
Dear Charles,
Thanks for you reply. I will send a excel file by e-mail
Thanks
Jorge
Hello sir,
How if my variable research consists of 1 independent and two dependent variables? in case, in Creswell I have read that the distribution of research suppose to non-parametric test with using Rank Spearman Correlation. When I discuss this to other lectures, he advises me to use one way ANOVA. While my dependent variable consist of “vocabulary” and “enthusiasm”. My question is “can rank describe both of the variable into 1 test?” This is the confusing I think a lot.
I am really looking forward to hearing from you.
Sorry Karina, but I would need more information about the hypothesis that you want to test and the nature of your data before I could answer your question. I don’t see how you can use ANOVA if you have two dependent variables; MANOVA is used instead, although I don’t know whether this appropriate in your case.
Charles
Hi Charles,
Thanks for the tool.
Why could I be getting negative values for SS of residuals?
JG
Joseph,
You shouldn’t get a negative value. If you send me an Excel file with your data and the negative result for SS of residuals, I will try to figure out what is going wrong.
Charles
Hi Charles
Thank you so much for this software
I am having a challenge. When I try run the one way ANOVA it says alpha must be a value between 0 and 0.5 but my alpha is at default
Vusie,
What do you mean by your alpha is at default? Do you mean .05? This is a value between 0 and .5.
Charles
Hello Charles,
I receive the same error message. And yes, you are correct .05 is a value between 0 and 0.5 so it really shouldn’t be giving this meassage.. : /
Thank you for making this wonderful resource,
Steven
Hello Steven,
Glad that you like the Real Statistics resource.
You might receive this error message when your system uses a comma instead of a period as the decimal symbol. You have the following choices in this case:
1. Change the decimal symbol from comma to period. You need to do this in Windows and in Excel
2. Click on the Config button on Real Statistics main dialog box (the one that comes up when you press Ctrl-m) and choose the Use Percentage option. Now instead of entering .05 (or 0,05) as the value for Alpha you enter 5 (meaning 5%).
3. Enter 0 for the value of Alpha. In the output, change the cell containing Alpha from 0 to whatever value you want (using your system’s decimal symbol).
Charles
how do i calculate p value using real statistics
Paul,
If you mean the p-value for ANOVA, just use Real Statistics’ ANOVA data analysis tool. You can download the software for free and then follow the instructions on the website.
Charles
how do you calculate f ratios for interactions?
See Two Factor ANOVA with Replications
Charles
Hi Charles!
I have an RCBD experiment testing 7 treatments with 3 replications each. I am trying to find out which treatment generates the highest yield. Is two-way ANOVA an appropriate test for this? If so, then can I use Tukey’s HSD test after it when significant differences are detected?
Hi L.A.
I will be addressing these sorts of problems in the next release of the Real Statistics Resource Pack.
Charles
Hello Sir,
Have you updated the version to enable analysis of RCBD experiments?
I have 8 treatments and 3 replication (blocks). Followed by Tukey’s test for comparison of means.
Kindly help.
Hello Rohan,
Yes, the Real Statistics software supports RCBD. See
https://real-statistics.com/design-of-experiments/completely-randomized-design/randomized-complete-block-design/
Charles
Hi Charles,
Can you explain to me why when we test shapiro-wilks in excel using your calculations versus shapiro-wilks in SAS we get different results?
Thanks!
Kim
Kim,
I don’t know how SAS calculates Shapiro-Wilk. What was the p-value you got from SAS and what was it in Excel? How big is your sample?
Charles
Hi Charles, thanks a lot for your website. I twice arrived at your website over a period of 2 years. Thought I might try asking you this question I’ve long had.
What’s the difference between ANOVA and regression? I get the impression regression analyses variance, and thereby reaches the line of regression. So isn’t that “Use ANOVA to regress”? Thanks. So isn’t ANOVA and regression really just the same thing, start with ANOVA, end with line of regression.
Would appreciate some clarification please. Thanks.
George,
In a very real sense they are the same things. You can perform Anova via regression (using dummy variables) as described in the webpages https://real-statistics.com/multiple-regression/anova-using-regression/ and https://real-statistics.com/multiple-regression/unbalanced-factorial-anova/.
To carry out regression you use an F test to compare two variables, which is essentially an ANOVA. See Figure 3 of the webpage of https://real-statistics.com/multiple-regression/multiple-regression-analysis/.
Charles