Introduction
In Two Factor ANOVA without Replication, we consider the analysis where there is only one sample item for each combination of factor A and B levels. On this webpage, we extend this analysis to the case where there are multiple samples for each such combination. Thus, in addition to the main effects corresponding to A and B, we now study the interactions between A and B, which is the main reason for performing this type of analysis.
We will restrict ourselves to the case where all the samples are equal in size (balanced model). In Unbalanced Factorial ANOVA we show how to perform the analysis where the samples are not equal (unbalanced model) via regression.
You should not confuse ANOVA with replication with ANOVA with repeated measures as described in ANOVA with Repeated Measures.
Example introduced
As usual, we start with an example. We then provide some background information and then complete the analysis for the example.
Example 1: Repeat the analysis from Example 1 of Two Factor ANOVA without Replication, but this time with the data shown in Figure 1 where each combination of blend and crop has a sample of size 5.
Figure 1 – Data for Example 1
Structural Model
Definition 1: We extend the structural model of Definition 1 of Two Factor ANOVA without Replication as follows.
In Definition 1 of Two Factor ANOVA without Replication the r × c table contains the entries {xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c}. We extend these tables to contain entries {Xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c}, where Xij is a sample for level i of factor A and level j of factor B. Here Xij = {xijk: 1 ≤ k ≤ nij}. For now, we assume the nij are all equal of size m.
We use terms such as x̄i (or x̄i.) as an abbreviation for the mean of {xijk: 1 ≤ j ≤ c, 1 ≤ k ≤ m}. We also use terms such as x̄j (or x̄.j) as an abbreviation for the mean of {xijk: 1 ≤ i ≤ r, 1 ≤ k ≤ m}.
As in Definition 1 of Two Factor ANOVA without Replication, we define the effects αi and βj where
Similarly, we define ai and bj where
We use δij for the effect of level i of factor A with level j of factor B, i.e. the interaction of level i of factor A and level j of factor B. Thus, δij = μij – μi – μj + μ. Similarly, we have
Finally, we can represent each element in the sample as
where εijk denotes the error (or unexplained) amount. As before we have the sample version
where eijk is the counterpart to εijk in the sample. Note that
Null Hypotheses
As in Definition 1 of Two Factor ANOVA without Replication, the null hypotheses for the main effects are:
H0: μ1. = μ2. = … = μr. (Factor A)
H0: μ.2 = μ.2 = … = μ.c (Factor B)
These are equivalent to:
H0: αi = 0 for all i (Factor A)
H0: βj = 0 for all j (Factor B)
In addition, there is a null hypothesis for the effects due to the interaction between factors A and B.
H0: δij = 0 for all i, j
More about the structural model
Definition 2: Using the terminology of Definition 1, define
We can also define the following entities:
Since the within groups terms are used as the error terms in our model, we also use the following symbols:
Properties
Property 1:
Proof: Clearly
If we square both sides of the equation, sum over i, j, and k, and then simplify (with various terms equal to zero as in the proof of Property 2 of Basic Concepts for ANOVA), we get the first result. For the second,
Property 2: Note that the between-group terms are as for the one-way ANOVA, namely
The proof is similar to the proof of Property 1. It also follows that
Property 3: If a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed and with all (or ) equal, then
Proof: The proof is similar to that of Property 1 of Basic Concepts for ANOVA.
Property 4: Suppose a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed.
If all μi are equal and all are equal then
If all μj are equal and all are equal then
Also, under certain circumstances,
Proof: The result follows from Property 3 and Property 1 of F Distribution.
Property 5:
Statistical Tests
We use the following tests:
Assumptions
The assumptions for Two Factor ANOVA are similar to those for One Factor ANOVA, namely
- All samples are drawn from normally distributed populations
- The samples are drawn from populations that have a common variance
- All samples are drawn independently from each other
- Within each sample, the observations are sampled randomly and independently of each other
By sample, here we mean each combination of levels from the two factors. We also want to make sure there are no outliers that can distort the results of the test. See ANOVA Assumptions for how we check these assumptions using the Real Statistics Resource Pack.
Example continued
We now return to Example 1 and show how to conduct the required analysis using Excel’s Anova: Two-factor With Replication data analysis tool.
Example 1 (continued): The summary output from the data analysis tool is given on the right side of Figure 2, with the sample data repeated on the left side of the figure.
Figure 2 – Summary output of ANOVA data analysis for Example 1
The top part of Figure 3 contains the rest of the output from the data analysis tool. We’ll explain the bottom part momentarily.
Figure 3 – ANOVA analysis for Example 1
We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.
Since the p-value (blends) = .00025 < .05 = α, we reject the Factor A null hypothesis and conclude that the blends are statistically different.
Interaction Plots
We also see that the p-value (interactions) = .0456 < .05 = α, and so conclude there are significant differences in the interaction between crop and blend. We can look more carefully at the interactions by plotting the mean interactions between the levels of the two factors (see Figure 4). Lines that are roughly parallel are indications of the lack of interaction, while lines that are not roughly parallel indicate interaction.
From the first chart we can see that Blend Y has quite a different pattern from the other brands, especially since the line for Blend Y is trending down towards Soy and up towards Rice, exactly the opposite of Blend X and Z). We also see that Blend X is trending up towards Soy much more abruptly than Blend Z.
Figure 4 – Interaction plots for Example 1
Worksheet Functions
Although the analysis in Figures 2 and 3 was produced automatically by Excel’s data analysis tool, the same result can be produced using Excel formulas, just as we were able to do for Example 1 of Two Factor ANOVA without Replication. In fact, all the entries in the ANOVA table in Figure 3 can be calculated using the tables constructed in the bottom part of Figure 3 in exactly the same way as was done in Example 1 of Two Factor ANOVA without Replication.
In fact, the only thing new is the calculation of the error term SSW. To calculate it we must first construct the table of the square deviations for all the interactions from their mean. This table appears in cells J38:N41 of Figure 3. E.g. the entry for SSWheat,BrandX (in cell K39) is =DEVSQ(B5:B9). SSW is then calculated as the sum of all the terms in the table, namely =SUM(K39:N41).
Alternatively, we can use Property 2 to calculate SSBet and then use the fact that SSW = SST – SSBet. To calculate SSBet we first construct the table of the means of the various interactions of factors A and B (range J43:N46 of Figure 3), as described below. SSBet is now calculated using the formula =DEVSQ(K44:N46)*H5. For Example 1, SSBet = 18420.5, and so SSW = SST – SSBet = 39640.9 – 18420.5 = 21220.4.
Example using row formatting
Example 2: Repeat the analysis for the data in Example 1 by using the presentation of the data given in the table on the left of Figure 5.
Figure 5 – Alternative presentation of data in Example 1
Excel’s ANOVA data analysis tools don’t support data in this format, and so we must proceed to create the ANOVA table (i.e. the output found in Figure 3) using the formulas. This is straightforward, although tedious, with the result presented in Figure 6. As usual, the hardest part is the calculations for the SS terms, which are shown on the right side of the worksheet in Figure 6.
Figure 6 – ANOVA output for Example 2
When the assumptions are not met
In general, when the assumptions are violated, transformations and non-parametric (rank) tests are not very useful for two-way ANOVA. We can instead abandon the omnibus test and apply the various planned and unplanned tests described in Planned Comparisons for ANOVA and Unplanned Comparisons for ANOVA by treating the two-way ANOVA as a one-way ANOVA.
In particular, when the variances are not equal we can apply Welch’s correction for contrasts. We can also use the Scheirer-Ray-Hare test or Aligned Rank Transform (ART) ANOVA
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Tutorialpoint (2024) How to conduct ANOVA two-factor with replications in Excel
https://www.tutorialspoint.com/how-to-conduct-anova-two-factor-with-replication-in-excel
I have analyzed five different cultivar of apples for TPC, TFC, Yeild etc. Which is basically a comparative study. So i took 3 random samples of each cultivar and analyzed each sample 3 times( 3 replication). Now my question is that which way anova is to be applied. second, while putting the data in excel i have to use which of the bellow pattern
1. sample1; rep1 rep2 rep3 OR
2. sample 1; mean of 3 replicate
Sorry, but I don’t understand what these patterns mean.
Does your data not correspond to the example shown on this webpage?
Charles
i have a set of data. i have calculated some soil properties of 7 types of tree areas. each type of tree are repeated 3 times so total plot is 21. from each plot i have to take 4 soil sample from 4 consecutive depths so total data for one aspect is 84. i want to know how to put the data in random block factorial design for anova calculation in excel please help.
Hello Aditya,
What hypothesis (or hypotheses) do you want to test?
Charles
Hi Charles,
I have 4 groups of animals (below). Two groups are either normal (T+) or abnormal (T-) for a T gene. A subgroup of T- and T+ groups underwent control or surgery. We then measured the effect of presence/absence of T gene on surgery effect on some serum factor.
I.e. We are assessing the effects of T gene and surgery as two independent factors on one dependent factor (serum factor).
I am using two way ANOVA with replication for comparison and to find interaction between T gene and surgery. Am I correct?
Also, can two way ANOVA be done on unequal Ns of groups?
Thank you very much for any feedback
Hesham
Control Surgery
T- 0.16 0.153
0.17 0.156
0.15 0.166
0.15 0.145
0.16 0.162
0.162 0.155
0.158 0.15
T+ 0.15 0.164
0.154 0.146
0.161 0.166
0.161 0.153
0.149 0.156
0.152 0.146
0.152 0.15
Hello Hesham,
Sounds like a reasonable approach. You can use ANOVA even with unequal group size, but you need to use the regression approach as described at
Unbalanced Factorial ANOVA
Charles
Charles
In the ANOVA with replication, I ahve seen the F values calculating in a couple of different ways.
Fcd = MScd/MSinter
Fmd = MSmd/MSinter
Finter = MSinter/MSres
or
Fcd = MScd/MSres
Fmd = MSmd/MSres
Finter = MSinter/MSres
The last calculation is the one used in Excel. Would you know when to use one calculation versus the the other?
With further digging, it seems that the difference could be related to whether the not the blend and crop are random or fixed effects.
I guess the question now is when should you assume random effects and when should you assume fixed effects.
Hello Ian,
See https://real-statistics.com/anova-random-nested-factors/
especially the webpage on Two Factor ANOVA with random effects.
Charles
Charles
I am making surface measurements of a flat rectangular sheet. The measurements are made with something the looks like a CNC table where the head moves in one direction (x) across the sheet and takes measurement a fixed distance apart. The head then indexes in the Y direction and and scans back in the opposite direction taking a second row of measurements. I end up with a 2 dimensional array of data. In the normal case, I just do a 2-way ANOVA and get an indication of whether the variability in each direction is significant and if it is what the expected std dev is.
Looking at how the measurement is made, it is possible the the measurements taking in one direction are slightly different than in the other I thought to treat neighbouring pairs of points as replicates. This seems to be useful identifying a systematic measurement error. A question is when trying to estimate variance, do I use a fixed, random or mixed formulation.
Wish you the very best during this holiday season!
Ian,
I dont know the answer to your question. Perhaps someone else in the community can answer.
Happy Holidays to you as well.
Charles
Hi thank you for this woderful website,
Can you explain why in two way ANOVA with replication Sum on i sum on j of eijk is equal to zero
Aziz,
These formulas capture the fact that the errors are random.
Charles
Near the end of this anova analysis you wrote the following:
We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.
I suggest a slightly different version of formulating the conclusion:
….and so conclude that we cannot reject the null hypothesis with 95% confidence.
It just says that we cannot state “we’re sure 95% that there is a difference between crops”
The two main effects are row and column position. I can collect data so that I have replicates. For some reason, I am having a hard time understanding the meaning of the interactive in a physical sense. If the interactive term is significant, what does that imply wrt what underlying data looks like? Appreciate you bearing with me on this.
Ian,
I understand that you are measuring something about plants grown in a rectangular grid. Since you are considering ANOVA with replication, I presume that you have a number of such grids (the replication. Suppose, for example, that each grid is 10 x 5 (and so each grid contains 50 small square plots) and that you are measuring the height of the plants. If you find a significant result for the Interaction factor, this would mean that there are significant differences in the height of the plants depending on their position (row x column) on the grid.
Charles
Charles your pages have been very helpful. I am struggling to interpret the interaction term in a 2-way with replication ANOVA.
For a hypothetical problem, picture a rectangular garden that is divided up into 10 rows and 5 columns. I plant 2 seeds in each square of the garden and at the end of the experiment, I measure the height of each of the plants. I am looking to see if position in the garden impacts the height of the plants.
I am having a having trouble coming up with a physical explanation fot the interactive term when the interactive term is significant.
Appreciate any coments you might have.
Ian,
I am assuming that your two factors are Row and Column and that you got a significant result for the interaction (Row x Column). It seems like you need to do some follow up analysis to see where the mean differences are are located. E.g. perhaps, the corners of the 10 x 5 plot is were the least (or most growth) is located.
Charles
Thanks Charles. Your assumptions are correct. I guess the interaction would become obvious in a graph like Figure 4.
If we look at Figure 4, where there is a clear interaction, how do you describe the interaction in terms of the blocking that has been done (grains and fertilizers). Or can no statements be made?
Here is a slightly different scenario. Rather than 10 x 5, if I specifically account for the position of the 2 plants in each cell resulting in a 20×5 grid (rather than a 10×5 grid with 2 plants in each cell). Should I treat this as a 20×5 2-way with no replication or a 10×5 with replication.
I really appreciate the discussion on this.
Or as a 3way with blocks for row, column and position in cell?
Ian,
At least in the way I tend to use the words, there is no blocking in the standard two factor ANOVA. The approaches that you may be referencing, based on blocking, are described in another part of the website. See the following: https://real-statistics.com/design-of-experiments/
Charles
Charles
I see my mistake Lets go back to
“Rather than 10 x 5, if I specifically account for the position of the 2 plants in each cell resulting in a 20×5 grid (rather than a 10×5 grid with 2 plants in each cell). Should I treat this as a 20×5 2-way with no replication or a 10×5 with replication.”
In this case if the interaction term was significant what would be the physical meaning?
Take Care
Ian,
Let’s go back to the beginning. What hypothesis are you trying to test?
Charles
Dear Charles,
i’m currently interpreting my thesis results entitled: Acceptability, physico-chemical and nutritional property of a mixed tropical fruit puree. In my study, i have 2 independent variables namely: formulation and pasteurization conditions. at the same time, i have several dependent variables: sensory evaluation results, physico-chemical and nutritional properties of my product. i had 2 sampling periods and during each sampling period, i gathered 50 respondents for the sensory evaluation test of my product and one bottle for each sample for the physico-chem and nutritional analysis. After gathering data, my friend told me to use 2-way anova with replication as my statistical tool, and i followed her advise. After calculating everything using MS excel, it turned out that some of findings had a significant difference. i was wondering, if i’m using the correct statistical tool? moreover, way back in my 3rd year, i was taught to conduct a post hoc test specifically DNMRT whenever findings/ results turned out to have a significant difference. now my other concern is: is it possible to use DNMRT as post hoc test after using 2-way anova with replication? if yes, can you provide me with the necessary steps to do the DNMRT? please, it will really help me lot cause i’m really struggling.
Thank you for you for your time.
In my study , i have 2 independent variables namely: formulation and pasteurization conditions. at the same time, i have several dependent variables: sensory evaluation results, physico-chemical and nutritional properties of my product. i was told to use 2 way -anova with replication as my statistical tool. i followed her instruction and after calculating the results, my findings turned out to have a significant difference. I’m just curious and at the same time confused, should I proceed to Duncan’s multiple range test or not? and if ever i wish to proceed to conduct DNMRT, how should i do it using the 2-way anova with replication? can you provide me the necessary steps to conduct the DNMRT? please , it will really help me alot. thanks
Clary,
There are many options for the follow up tests after a significant ANOVA. Which test to use depends on the sitruation (equal sample sizes, similar variances, etc.). In most cases Tukey HSD is a reasonable choice. See the following webpages for details:
Unplanned Comparisons
Planned Comparisons
Charles
Charles
Great set of tutorials. Really appreciated browsing through the explanations and examples. I was doing a large 2-way no replication dataset (100, 600). When using Excel, the results (SS, MS and F) were different than what I am calculating manually. I am confident in the results that I have calculated and that round off errors were minimized. Have you seen this type of erro in Excel before?
Ian,
It sounds like either your manual calculations are incorrect or the Excel results are incorrect. I have not seen an error in this Excel calculation before, at least in versions of Excel after Excel 2007. Have you tried using the Real Statistics software to see whether you get the same results?
Charles
Sir,
Could you discuss the credibility of the interpretations and conclusions after using two way ANOVA? and Is there anything we should be concerned about? for example, the violation of normality assumption.
Trang,
1. This is discussed on the referenced webpage (see the examples).
2. The assumptions are described on the following webpage:
https://real-statistics.com/two-way-anova/testing-two-factor-anova-assumptions/
Charles
Sir,
Now I have a case needed to solve here:
Suppose that a local chapter of sales professionals in the greater San Francisco area conducted a
survey of its membership to study the relationship, if any, between the years of experience and
salary for individuals employed in inside and outside sales positions. On the survey, respondents
were asked to specify one of three levels of years of experience: low (1-10 years), medium (11-
20 years), and high (21 or more years). The objective of this study is to test for any significant
interaction between Position and Experience and to test for any significant differences in salary
due to position and years of experience
I wonder about the null hypotheses.
There are 3 sets of hypotheses, are not there?
H01: There is no differences in the mean salaries of sale person lying in different levels of years of experience.
H02: There is no differences in the mean salaries of sale person lying in different levels of position.
H03: There is no significant interaction between position and experience.
Trang,
This looks correct.
Charles
Dear Sir,
I am confused by following statements below Figure 3:
“Figure 3 – ANOVA analysis for Example 1
We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor A null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.
Since the p-value (blends) = .00025 .05 = α, we can’t reject the Factor A null hypothesis,” I believe, instead of Factor A, it should be Factor B.
Similarly, in sentence “Since the p-value (blends) = .00025 < .05 = α, we reject the Factor B null hypothesis" I believe, instead of Factor B, it should be Factor A.
Whatever I believe is incorrect, please explain the conclusions.
Thank You,
Vijay Rathod
Vijay,
Thanks for bringing this error to my attention. I have just corrected the webpage by interchanging A with B.
I appreciate your help in making the website more accurate.
Charles
Dear Sir,
In my mail of July 25, 2017, last sentence should have begun with” If ”. Sentence should have been “If whatever I believe is incorrect, please explain the conclusions.” I am sorry for whatever inconvenience it may have caused. I was wondering whether the mail has become meaningless. It is heartening to see, you understood it as it was intended. I am learning statistics with the help of your website.
Regards,
Vijay Rathod.
sir above showed example two way ANOVA with replication of fertilizer vs crop is which type of model wheather it is LSD, split type model, …etc???.
And can u suggest me the refference for above example
Manoj,
I created the example. It is not based on a real study. The numbers are made up by me. The purpose of the example is to show you how to perform two factor Anova with replication.
Charles
hie
i have an experiment with two factors and five levels and am told that it should be replicated four times. how can i go about it?
Niddah,
The referenced webpage describes this very situation.
Charles
I AM MANOJ MEENA
i have used Factorial RBD, with one factor at 3 level and factor two at five levels , total treatment 15 with 3 replication please send the analysis process in excel
Manoj,
This process can be accomplished using Excel’s Two Factor Anova with Replication data analysis tool as described on the website. You can also use Real Statistics Two Factor Anova data analysis tool.
Charles
Hi Dr,
Please inform what mean m & c in formula is SSA. What mean r in formula SSB.
Please give this values using the example you used.
Thanks
Hi DR,
When number of observations is equal for all factors (columns & lines), my result is same as yours. But when number of observations is different. For example column one has 15 values, columns 2 has 13 values, columns 3 has 14 values. In this case my results are different than yours.
Where is the problem?
Thanks for this great work.
Abdekader,
With unbalanced models, you need to use a different approach. See the following webpage:
https://real-statistics.com/multiple-regression/unbalanced-factorial-anova/
Charles
r = # of levels in the row factor, which for Example 1 is Fertilizer with r = 3
c = # of levels in the column factor, which for Example 1 is Crop with c = 4
m = # of replications which for Example 1 is m = 5
Charles
Hi DR,
Many thanks for your help.
Now, i need formulas when we have ANOVA Two Factor ANOVA with unequal Replications
Dear Dr. Charles,
Thank you so much to provide us so great sources here!
I have some problems with my data analysis, could you please help me?
There are 3 pathologists reviewed 80 slides via 3 different systems and the time-taken for each reviewing was recorded as seconds. The slides they reviewed are the same. In another word, each slide was reviewed by each pathologist via each system. For each slide, 9 results got. Now I need to know if there are any differences of time-taken between different systems. That means if any system takes significantly less time to complete a reviewing.
I tried Two Factor ANOVA with Replication according to your above instructions, and got 3 p-values (for pathologists, systems and interaction) much less than 0.05. Now what I am wondering is as follows:
1. Did I choose the appropriate analysis for my data?
2. If I still need to know which two systems are different, what I should do further?
3. How to explain the interaction? I plotted two line charts, but still don’t know how to interpreter them.
Thank you very much!
Best regards
Susan
Susan,
If I understand the situation correctly, you have 80 subjects (i.e. the slides) that are being evaluated by 3 different pathologists, each using the same 3 methods. This seems similar to the problem described on the following webpage:
https://real-statistics.com/anova-repeated-measures/two-within-subjects-factors/
The approach used is ANOVA with repeated measures. See the above webpage also for your questions 2 and 3.
Charles
Dear Dr. Chales,
Thank you very, very much for your reply and instruction! Hope I can finally handle the data well.
Best regards
Susan
Hi Charles, this is a great teaching tool. I just switched to Excel from SPSS for teaching my stats classes because of your add-in, and so far its great. I have noticed a peculiar behavior in one of the factorial calculations and I was wondering if you prefer this kind of question posted here or sent privately?
Andrew,
Glad to see that you are using the Real Statistics add-in for teaching purposes. This was one of my goals when developing the software.
Generally, it is best to ask questions here (as a comment). If you need to include a spreadsheet, you can send it via an email at the address shown on Contact Us.
Charles
Hi Dr. Charles, I’m now trying to analyze my thesis results. My study is about the control of diseases of eggplant grown in open field and in greenhouse, which is my mainplot, the two types of cultivation. My subplot includes six treatments including the control, with four replications, arranged in RCB layout. I looked into similar theses with the same experimental design as mine. It is similar to your annova in fig 6 except that it has one more source of variation, the replication. I’m confused now which annova will I use. Will it be best to use split plot annova, or RCB layout annova? Or is my study a special case which needs a different analysis?
Melvin,
If you are using a split plot design, then I suggest that you use the tools described on the following webpage:
https://real-statistics.com/design-of-experiments/split-plot-design/
Charles
Hello Charles, Thank you for this great site.
I have a question how its best to analyze my data (Anova) for a whole experiment instead of independent data sets. Below is and example of what my data may look like.
Crop X Crop X
Product application:Treatment 1 Product application: Treatment 2
Plant # Leaf 1 Leaf 2 Leaf 3 Plant # Leaf 1 Leaf 2 Leaf 3
1 70 85 50 7 65 75 60
2 71 86 51 8 66 76 61
3 72 87 52 9 67 77 62
4 73 88 53 10 68 78 63
5 74 89 54 11 69 79 64
6 75 90 55 12 70 80 65
The data of each leaf is taken at different time points, for example Leaf 1 data maybe taken at day 18 only and Leaf 2 at 27 days only because at the time of single application the leaves are at different developmental stages therefore they need time to grow. Also I can’t do a average of a single plant’s measurement over all the leaves because they can vary greatly between leaves of the single plant however not between the plants (has to do with developmental stages). I am currently comparing means (one way anova) Treatment 1 and Treatment 2 for Leaf 1, and for Leaf 2 and 3 to have three independent data sets. I would like to compare the experiment as a whole to see the affect on the plant as a whole however not sure what would be the best way to do that.
Douglas,
Sorry, but I don’t understand your scenario well enough to give any advice.
Perhaps you can use Two Factor ANOVA or Split-plot ANOVA. Both are described on the website and are included in the Real Statistics software.
Charles
Hi Charles.
Great website and thanks for answering queries here.
My question is whether or not this type of ANOVA would be appropriate for a randomised complete block trial?
The standard for a RCBT seems to be very similar to your example above but also includes degrees of freedom in the replication.
Thanks, Ash.
Ash,
The approach does indeed use a randomized complete block design taking sphercity into account.
I didn’t understand your comment about “degrees of freedom in the replication”.
Charles
I have some raw data from a RCBD trial and have been asked to check the results of a third party who ran analysis on it.
The trial had three replications which were run concurrently with each other. Testing two products at 4 different rates of application, to see if their effect was statistically different.
Their method of analysis seems to have considered the degrees of freedom in replication, R.
The table below shows the form that their results were presented in. I followed your method and did not consider degrees of freedom for R which yielded different results, notably DFerror = 14 below and 16 in your method.
Am I applying an incorrect method?
DF SS MS F P(F) LSD
Total 23
R 2
A 1
B 3
AB 3
ERROR 14
Ash,
I am not able to comment without additional information. If you send me an Excel file with your data and the results you obtained from R (please indicate which R capability you are using) and Excel, I will try to figure out what is going on. You can send this information to my email address listed at Contact Us.
Charles
Dear Dr Charles
I am studying the difference of X in 5 different nuclei of the brain (a1, a2, a3, a4, a5) in different time (control/pre/post). I have some animals of each group (3 controls, 3pre and 3 post). I know that I have to do a Two way ANOVA, but, If I do the same experiment in the same animal the measure is really different in almost all the nuclei and I don’t trust in doing the mean. So, I wonder if there is something I can do to avoid to do the mean.
Thank you for your time.
Estrella
Moreover, I would like to know how not to do the mean between the controls, pre or post. Because I want to compare them.
I don’t understand your comment about the mean.
Charles
Dear Charles,
I’m preparing for my Business Statistics exam coming up next week, and one of the practice questions was:
Explain why, when a test is being done to check whether there is a significant interaction between two treatments, replications are needed.
I don’t really understand this question, because the way I see it, replications are the fact that we have more than one observation in each cell, and you can still check for significant interaction without replication occurring… Plus this question is only worth 4 points out of 50, so I don’t think expect a very detailed answer.
Anyways, it would be very kind of you if you could help me out with this!
Have a great day!
Dear Roxane,
A partial answer is that in the case where there is no replication, the interaction is considered to be the error term. See
Two Factor ANOVA without Replication
Charles
How would this problem look like if it were done on a 5 step hypothesis?
Sorry Alex, but I don’t know which problem you are referring to nor what 5 step hypothesis you are referring to.
Charles
Dear Dr Charles,
I have a scenario where in which I have a spreadsheet with 8 columns, across these 8 columns are 7 independent variables including discrete variables (for example I have Sale Week “Yes/No”) and continuous variables (such as temperature which is unique for each week at each store). The last column is a “Sales” column which shows the total sales for a specific store (1 of 6) on a specific week (1 of 6). I am tasked with finding the factors that effect sales. Obviously there are multiple factors that could effect it (such as temperature… whether it is a sale week… whether it is the store size etc.) so I need to test this, although can I use multiple ANOVA tests? Would this be at risk to a type 1 error?
Please let me know if you need more information regarding the actual dataset, I tried to summarise the data briefly. However I should note that I have been specifically asked to use ANOVA and/or t-tests to analyse the data.
Regards,
Chris
Why don’t you use regression instead?
Charles
I agree that regression would be more suitable, however, for the task I have been specifically asked to use ANOVA (or t-tests) to detect which factors affect sales.
When I did a one-way ANOVA on temperature (I split the continuous data into low/med/high temperature), whilst there were significant differences on average sales between the groups of temperatures, it wouldn’t technically mean temperature had an effect on sales (because there are other independent factors in the data), would it? I’d have to find out if temperature had an interaction effect with another variable, but I’m not sure how to approach that?
Thanks for your help.
Chris
Chris,
You said that you have 8 columns, which I understood represents 7 independent variables and the dependent variable Sales. You seem to have data for these variables for different stores in different weeks.
For argument sake, suppose you want to look at the interaction between the temperature (low/medium/high) and some other variable, say training level (high/low). Further suppose that your sample consists of 60 stores and for each of the 6 combination of temperature and training there were exactly 10 stores. You could use a two-way ANOVA model (with replication) with temperature and training factors to model the interaction between temperature and training.
If you have data for 4 weeks, you can perform the above analysis for any of the four weeks or the average of the four weeks.
If in the above scenario the number of elements in each of the interaction is not equal (10 x 6 = 60 in the above), you would need to use an unbalanced ANOVA model.
I hope this helps you.
Charles
Hi, could you please help me on the sum of squares part, I did the steps as you have above but I’m not getting the right answer for my question. Also could you please explain how to get the p-value
Thanks
Ashley,
I suggest that first you make sure that you understand how to calculate the sum of squares and p-value in the one-way ANOVA case. The process is similar, but a little easier to understand. See
One-way ANOVA
You can also go to the Examples Workbook Part 2 to look at the formulas on the spreadsheets used in calculating the sum of squares and p-values. See
Examples Workbooks
Charles
can you help me in solving statistical analysis?
Jomar,
Please be more specific about the type of help that you need.
Charles
Hi!
I’m just starting to learn stats.
I need to prove that resolution affects time. what method/test will I use? thanks!
Sorry Lois, but you haven’t provided enough information for me to be able to give you an answer.
Charles
Hi Charles,
I am looking for some statistical assistance. I have three factors (NI, MOL & CO), Each factor contains 3 levels(2.5 WT%, 5 WT% AND 7.5 WT%). I am conducting experiments using L27 orthogonal array. What type of ANOVA I can use for finding the influence of each factor and also the influence of combination(NI*MOL, NI*CO, MOL*CO and NI*MOL*CO)
Thank you so much!
Srikant,
You probably want three factor ANOVA. See the webpage Three Factor ANOVA.
Charles
Dear Dr Charles,
I was greatly helped by the real stat,
may I ask…
based on Figure 4 – Interaction plots for Example 1;
“From the first chart we can see that Brand X has a quite a different pattern from the other brands (especially regarding Soy). Although less dramatic, Brand Y is also different from Brand Z (especially since the line for Brand Y is trending up towards Soy, but trending down towards Rice, exactly the opposite of Brand Z).”
Maybe you mean is “Blend” not “Brand”?
and in these words: “Brand Y is trending up towards Soy, but trending down towards Rice”.
It’s looks like mistyping to me. These: “trending down” become “trending up”, and vice versa.
thank you
Dear Jhon77,
Thanks for catching these errors. I have just reworded the paragraph in error on the website. I really appreciate your finding these problems and your help in making the website better for the growing community of people who are using and depending on the site.
Charles
Hi i’m working with one parameter with is protein content of 100 wheat genotypes cultivated in three growing seasons (season 1 (70 genotypes); season 2 (15 genotypes) and season 3 (15 genotypes)) ( with similar 14 genotypes between the 3 seasons) ; i did a combined analysis with genotype as fixed factor and crop year as random factor, results showed that genotype had the major impact , than G*CY interaction and finally crop year; what you think??
Fatma,
It sounds like a reasonable approach, although I don’t have enough information to give you a definitive answer.
Charles
sir, i am an M.Sc Hons student, i analyzed my data while using two factorial design (two way ANOVA). i have 5 fertilizers, 5 species of sorghum and two replicates. one of my senior told me that i never use less than three replicates in two factorial design. sir please reply me what should i do?
Shoaib,
I don’t know of any such rule that you need at least 3 replicates. With such a small sample, the statistical power of your test will be very low.
Charles
I am applying real stat add in on my data. I have two factors and two replications. One factor has four levels and other has two. When i apply two way anove, i get columns and rows but i did not have the interactions. Please help me out in this regard
Please send me an Excel file with your data and calculations and I will try to figure out what is going on. You can get my email address on the webpage
Contact Us
Charles
Hi Charles,
I am looking for some statistical assistance. I have three groups (2 gene knockdowns and 1 negative control),where the assumption is there is no difference among them. Each time I have run the experiment, 30 technical replicates have been used for each group. I have run the experiment three times, giving me three biological replicates. I am wondering whether I should run an ANOVA with a two-tailed post-hoc Dunnett test against the negative control with repeated measures or replication?
Thank you so much!
Elizabeth,
If the three trials are based on the same 30 subjects per group, then it looks like you should use repeated measures (this will also be with replications). If the trials are on different subjects then depending on other details of the experiment you can simply run one-way ANOVA with 90 replicates per group.
Charles
Hi Charles,
Thank you for your response. The trials for each are on a different set of 30 subjects per group. So I assume I should be using the one-way ANOVE with 90 replicates as you mentioned, but I am wondering whether that will give the data too much power and overestimate statistical significance?
Thank you!
Elizabeth
Elizabeth,
I don’t understand why you have this concern. The sample doesn’t seem that large.
Charles
Please help will like you to exhaustively differentiation between Two way Anova with and without replication. Thanks in anticipation.
Olarewaju,
In Two-way ANOVA there are two factors, which I will call factors A and B. Suppose factor A has m levels (also called groups or treatments) and factor B ha n levels. Thus there are m x n combinations of levels from the two factors. These are the interactions between the two factors.
In Two-way ANOVA without replication, the sample for each of the m x n consists of just one element.
In Two-way ANOVA with replication, the sample for each of the m x n consists of two or more elements.
Charles
sir,
I am a student trying to complete my thesis. I am stuck with which method i should use. i have 4 different treatments: Treatment1(T1) : culm cutting in raised nursery bed
T2: branch cutting in raised bed
T3: Culm cutting in flat bed
T4: branch cutting in flat bed
Could you please recommend? Thank you
It really depends on what you are trying to test. If you are trying to determine whether the treatments yield the same of different results then you can use one-way ANOVA with the 4 treatments listed. If you also want to study the interaction effects, then use two factor ANOVA where one factor is the cutting type (branch vs culm) and the other factor is bed type (raised vs. flat).
Charles
Hi Dr Charles, can I ask,
I write a small manual for students but I have trouble with this ANOVA particularly.
Beacuse ist for students of education, I will present example from this science. Can you please tell me if I understand it right or wrong.
(fictitious example) I have two measurements: A) Before course and after course in:
knowledge of: A) social B) ontogenetic and C) clinical psychology.
From ANOVA with replication I should find out:
p1: differences in knowledge before and afrer course (rows)
p2: differences in knowledge between subjetcs (columns)
p3: interaction? This is the second part of my misapprehension.
Is this example right for this test? What interaction tells me about this factors?
Thank you very much!
Rob,
Based on my understanding of the scenario, you probably want to use ANOVA with repeated measures and not ANOVA with replication. It looks like a mixed model with factor A repeated measures and factor B not repeated measures. You can learn more about this type of model at the webpage
Mixed Repeated Measures ANOVA.
Charles
Dear Dr Charles
I study if a new method estimates the same score than the old method. I have 60 participants that get tested with both methods at 5 time points. Can I use Two Factor ANOVA with replication to determine if the methods get different results?
Thank you
Best regards
Rolf,
No, you need to use ANOVA with repeated measures. In fact you need the mixed version of the test – one between and one within factor. This is described on the webpage
Charles
Thanks Dr Chalers. Yes, your interpretation is correct.
Dear Dr Charles
I am studying the effect of treatment, say X. I have three samples in each group and in each
sample I obtained three readings before and after treatment X, my question is: Will ANOVA with replication will be the technique to seek answers by? if yes, then I should follow instructions in this page! below is data similar to what I have and mean.
thanks
no treatment
smaple 1 sample 2 sample 3
55 54 65 33 44 43 22 33 43
with treatment
smaple 1 sample 2 sample 3
56 52 61 33 45 41 33 34 41
Dear Mohammad,
Based on the data that you have presented, I understand the following about Sample 1. Please let me know whether this is correct.
There are three subjects in Sample 1. Subject 1 got a score of 55 before treatment and a score of 56 after treatment. Subject 2 got a score of 54 before treatment and a score of 52 after treatment. Subject 3 got a score of 65 before treatment and a score of 61 after treatment.
If this is the correct way to interpret the data (and presumably the interpretations for the other two samples is similar), then a two factor ANOVA with replication is not the correct test. Instead you need a two factor mixed ANOVA where one of the factors is repeated measures. This is described on the following webpage:
https://real-statistics.com/anova-repeated-measures/one-between-subjects-factor-and-one-within-subjects-factor/
Charles
i am studying impact of emotional intelligence on teaching with the variables like sex ,education , age, and managemaent type
so how can i use it for co-realating E.I to teaching
It sounds like you want to use regression or correlation, but this is not completely clear from your question. Please look at the following webpage for more information
Multiple Regression
Charles
Hi, I want to investigate sex differences and education level on test anxiety among students. My variables are as follows :
Independent variable 1 – Sex difference (male or female)
Independent variable 2- education level (grade 2 or grade 3 students)
Dependent variable – Test anxiety reported by the students.
Is this suitable for a 2 way ANOVA? If yes, when putting in the data, should I input the score of each student on the Test Anxiety Inventory (TAI)? Or the sum of the students who reported test anxiety? I got confused on how I can key in each datum of upto 261 students that participated. Thanks.
If text anxiety takes on a continuous set of values (or can be approximated by such values, e.g. 1, 2, 3, 4, 5, 6, 7), then this is indeed a 2 x 2 ANOVA, where you insert the test anxiety values in the four cells. Since 261 is not divisible by 4 this will be an unbalanced model. This can be solved by regression (see Unbalanced ANOVA).
You insert the scores of each student in the cells. A possible source of confusion is with the chi-square test where you insert sums.
Charles
i neva knew excel was that useful…i only used it in entering data and always said in mind that i will have to get a better software for my analysis….uv really made sense out of it for me i will not be underrating excel from now onwards….
I will be conducting feeding trials on village chickens using a locally formulated layer diet and commercial layer feed as the control. the trial is to compare egg production from the two diets. I intend to use ANOVA statistical analysis to analyse the data. there will be 4 replications. What is the appropriate design for such trials and what analysis method using would be the correct one?
Joe,
What are the replications? By 4 replications do you mean 4 days, 4 chickens or something else?
Charles
Hi
I have a trial comparing 7 fertiliser treatments on a Maize crop with 4 replicates (randomised block). I have 2 components to analyze. Total cob wt, and mean cob wt.
Which anova programme do I use?
Mike,
If for example the replicates are say four varieties of maize, then you have a two fixed factor ANOVA with a fixed fertilizer factor and a fixed maize variety factor design. This is based on the fact that the mean cob wt can be calculated from the total cob wt. If not then you would have two dependent variables and so you would need to use two factor MANOVA (note that Real Statistics software only supports one factor MANOVA at present).
Also this assumes that there are only 4 varieties of maize under study; if these 4 varieties were randomly chosen say from 100 possible varieties, then this would be a random factor and not a fixed factor.
Charles
Thanks for that.
Then maize crop is the same variety across the trial.
So what I have is, 7 fertilizer treatments replicated in 4 blocks.
So a randomised block design with 7 fertiliser treatments.
7*4=28plots, 7 plots per replicate
So what programme do I use?
cheers Mike
Mike,
If I understand correctly, you have 28 plots of land and 7 fixed fertilizer treatments. For each fertilizer you randomly select four plots of land and apply that fertilizer (each plot of land gets one fertilizer). If all you are interested in is the fertilizer, then you can simply use one-way ANOVA to compare the fertilizers. This assumes that all the plots of land are interchangeable (i.e. have similar characteristics). Please let me know whether this is your situation before I made any further suggestions.
Charles
Hi,
Yes thats correct.
The only thing I would add to that, is the replicates are in blocks, So its a ramdomised block design.
So Block 1 has 7 fertilizers trts randomised (all rep 1). Then block 2 has the next 7 trts (rep 2)…etc., etc
Does that make any difference ? Still use a one way anova?
Mike
With randomized block design you should use two factor ANOVA without replication. This is precisely the data analysis tool supplied by Excel.
Charles
Hi,
I was wondering if you could help. I’m looking to run a 2×2 anova … time (pre/post) x group (intervention/control) for 4 variables (various dependent variables. Is this considered a 2×2 anova , 2-factor w/out replication? i want to see if my variables changed between the group over pre/post-intervention. thank you!
If you only had one dependent variable, then this sounds like 2 x 2 ANOVA with one fixed factor (group) and one repeated measures factor (time).
If there is no (or little) correlation between the dependent variable, you can run four separate ANOVAs. The more leikely situation is that there is correlation, in which case you will likely want to use MANOVA.
Charles
when m=1, “with replication” reduces to “without replication”. The SS_AB term becomes identical to SS_E, and SS_W goes to zero, as it should be.
Conceptually, shouldn’t the interaction between A and B exists regardless the value of m? Why does one call SS_AB an interaction, with the SS_W as an error term, when m>1; but SS_E (reduced from SS_AB when m=1) is called an error term? Should the interaction between A and B be considered an error only when m=1?
If the error term represents the “unexplained amount”, is the interaction term “explained” when m>1, but becomes “unexplained” when m=1?
Thank you so much!
Heather,
The SS terms measure variability. In the without replication case, since there is only one data element in the intersection between A and B levels there is no variability and so SS_AB = 0. The error term is SS_W = SS_T – (SS_A + SS_B), which turns out to have the same formula as the SS_AB term in ANOVA with replication, but as I mentioned above since SS_W is not 0 it is not the SS of the interaction between A and B in the without replication case.
Charles
Thank you Charles. I was confused by the x_ij_bar notation in the SS_E term for the “w/o replication” case in the table under definition 2. It should be just x_ij, without the bar. It was obvious in the proof below.
The Excel ANOVA table seems to label the terms arbitrarily, your website helps a lot in clarifying it. Thank you!
Heather
Heather,
Thanks for catching this error. I have made a note on the webpage that this needs to be corrected.
Charles
Sir,
In this example we have three independent factors (Blends X, Y and Z) and four dependent continuous variables (rice, soy, wheat, corn) which we analyze with ANOVA. Will analyzing this data with MANOVA make any sense? If yes, what will be the difference?
Roman,
Actually in this problem Blend and Crops are categorical independent variables and Yield is the dependent variable (for a two factor ANOVA design). The probolem can be re-analyzed, as you have described, whereby Blends X, Y and Z are independent variables and rice, soy, wheat, corn are dependent variables, suitable for MANOVA.
In this case you need to make sure that the assumptions for MANOVA hold and then you can interpret the results as described in https://real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/. See eespecially Example 1 of https://real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-basic-concepts/.
For your information, I am currently reviewing the ANOVA portion of the website. Some of the next few things that I plan to add are webpages on mixed Anova model and how to perform repeated measure designs with MANOVA.
Charles