Two Factor ANOVA with Replication

Introduction

In Two Factor ANOVA without Replication, we consider the analysis where there is only one sample item for each combination of factor A and B levels. On this webpage, we extend this analysis to the case where there are multiple samples for each such combination. Thus, in addition to the main effects corresponding to A and B, we now study the interactions between A and B, which is the main reason for performing this type of analysis.

We will restrict ourselves to the case where all the samples are equal in size (balanced model). In Unbalanced Factorial ANOVA we show how to perform the analysis where the samples are not equal (unbalanced model) via regression. 

You should not confuse ANOVA with replication with ANOVA with repeated measures as described in ANOVA with Repeated Measures.

Example introduced

As usual, we start with an example. We then provide some background information and then complete the analysis for the example.

Example 1: Repeat the analysis from Example 1 of Two Factor ANOVA without Replication, but this time with the data shown in Figure 1 where each combination of blend and crop has a sample of size 5.

Data ANOVA without replication

Figure 1 – Data for Example 1

Structural Model

Definition 1: We extend the structural model of Definition 1 of Two Factor ANOVA without Replication as follows.

In Definition 1 of Two Factor ANOVA without Replication the r × c table contains the entries {xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c}. We extend these tables to contain entries {Xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c},  where Xij is a sample for level i of factor A and level j of factor B. Here Xij = {xijk: 1 ≤ k ≤ nij}. For now, we assume the nij are all equal of size m.

We use terms such as i (or i.) as an abbreviation for the mean of {xijk: 1 ≤ j  ≤ c, 1 ≤ k ≤ m}. We also use terms such as j (or .j) as an abbreviation for the mean of {xijk: 1 ≤ i ≤ r, 1 ≤ k ≤ m}.

As in Definition 1 of Two Factor ANOVA without Replication, we define the effects αi and βj where

image1360

Similarly, we define ai and bj where

image1363

We use δij for the effect of level i of factor A with level j of factor B, i.e. the interaction of level i of factor A and level j of factor B. Thus, δij = μij – μi – μj + μ. Similarly, we have

image1366

It is easy to show that
image1367

Finally, we can represent each element in the sample as

image1368

where εijk denotes the error (or unexplained) amount. As before we have the sample version

image1370

where eijk is the counterpart to εijk in the sample. Note that

image1373

and so
image1374

Also,
image1375

Null Hypotheses

As in Definition 1 of Two Factor ANOVA without Replication, the null hypotheses for the main effects are:

H0:  μ1. = μ2. = … = μr. (Factor A)

H0:  μ.2 = μ.2 = … = μ.c (Factor B)

These are equivalent to:

H0: αi = 0 for all i (Factor A)

H0βj = 0 for all j (Factor B)

In addition, there is a null hypothesis for the effects due to the interaction between factors A and B.

H0: δij = 0 for all i, j

More about the structural model

Definition 2: Using the terminology of Definition 1, define

ANOVA with replication formulas

We can also define the following entities:

image5062

Since the within groups terms are used as the error terms in our model, we also use the following symbols:

image1391

Properties

Property 1:

image1392

image1393

Proof: Clearly

image1394

If we square both sides of the equation, sum over i, j, and k, and then simplify (with various terms equal to zero as in the proof of Property 2 of Basic Concepts for ANOVA), we get the first result. For the second,

image1396

Property 2: Note that the between-group terms are as for the one-way ANOVA, namely

image1397

The proof is similar to the proof of Property 1. It also follows that

image1398

image1399

Property 3: If a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed and with all \sigma_j^2 (or \sigma_i^2 ) equal, then

image1401 image1402

Proof: The proof is similar to that of Property 1 of Basic Concepts for ANOVA.

Theorem 1: Suppose a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed.

If all μi are equal and all \sigma^2_{i} are equal then

image1403

If all μj are equal and all \sigma^2_{j} are equal then

image1072

Also, under certain circumstances,

image1404

Proof: The result follows from Property 3 and Theorem 1 of  F Distribution.

Property 4:

image1405 image1406

Statistical Tests

We use the following tests:

ANOVA with replication tests

Assumptions

The assumptions for Two Factor ANOVA are similar to those for One Factor ANOVA, namely

  • All samples are drawn from normally distributed populations
  • The samples are drawn from populations that have a common variance
  • All samples are drawn independently from each other
  • Within each sample, the observations are sampled randomly and independently of each other

By sample, here we mean each combination of levels from the two factors.  We also want to make sure there are no outliers that can distort the results of the test. See ANOVA Assumptions for how we check these assumptions using the Real Statistics Resource Pack.

Example continued

We now return to Example 1 and show how to conduct the required analysis using Excel’s Anova: Two-factor With Replication data analysis tool.

Example 1 (continued): The summary output from the data analysis tool is given on the right side of Figure 2, with the sample data repeated on the left side of the figure.

ANOVA replication Excel tool

Figure 2 – Summary output of ANOVA data analysis for Example 1

The top part of Figure 3 contains the rest of the output from the data analysis tool. We’ll explain the bottom part momentarily.

ANOVA replication Excel analysis

Figure 3 – ANOVA analysis for Example 1

We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

Since the p-value (blends) = .00025 < .05 = α, we reject the Factor A null hypothesis and conclude that the blends are statistically different.

Interaction Plots

We also see that the p-value (interactions) = .0456 < .05 = α, and so conclude there are significant differences in the interaction between crop and blend. We can look more carefully at the interactions by plotting the mean interactions between the levels of the two factors (see Figure 4). Lines that are roughly parallel are indications of the lack of interaction, while lines that are not roughly parallel indicate interaction.

From the first chart we can see that Blend Y has quite a different pattern from the other brands, especially since the line for Blend Y is trending down towards Soy and up towards Rice, exactly the opposite of Blend X and Z). We also see that Blend X is trending up towards Soy much more abruptly than Blend Z.

Interaction ANOVA plot Excel

Figure 4 – Interaction plots for Example 1

Worksheet Functions

Although the analysis in Figures 2 and 3 was produced automatically by Excel’s data analysis tool, the same result can be produced using Excel formulas, just as we were able to do for Example 1 of Two Factor ANOVA without Replication. In fact, all the entries in the ANOVA table in Figure 3 can be calculated using the tables constructed in the bottom part of Figure 3 in exactly the same way as was done in Example 1 of Two Factor ANOVA without Replication.

In fact, the only thing new is the calculation of the error term SSW. To calculate it we must first construct the table of the square deviations for all the interactions from their mean. This table appears in cells J38:N41 of Figure 3. E.g. the entry for SSWheat,BrandX (in cell K39) is =DEVSQ(B5:B9). SSW is then calculated as the sum of all the terms in the table, namely =SUM(K39:N41).

Alternatively, we can use Property 2 to calculate SSBet and then use the fact that SSW = SST SSBet. To calculate SSBet we first construct the table of the means of the various interactions of factors A and B (range J43:N46 of Figure 3), as described below. SSBet is now calculated using the formula =DEVSQ(K44:N46)*H5. For Example 1, SSBet = 18420.5, and so SSW = SST SSBet = 39640.9 – 18420.5 = 21220.4.

Example using row formatting

Example 2: Repeat the analysis for the data in Example 1 by using the presentation of the data given in the table on the left of Figure 5.

Alternative presentation ANOVA data

Figure 5 – Alternative presentation of data in Example 1

Excel’s ANOVA data analysis tools don’t support data in this format, and so we must proceed to create the ANOVA table (i.e. the output found in Figure 3) using the formulas. This is straightforward, although tedious, with the result presented in Figure 6. As usual, the hardest part is the calculations for the SS terms, which are shown on the right side of the worksheet in Figure 6.

Two factor ANOVA replication

Figure 6 – ANOVA output for Example 2

When the assumptions are not met

In general, when the assumptions are violated, transformations and non-parametric (rank) tests are not very useful for two-way ANOVA. We can instead abandon the omnibus test and apply the various planned and unplanned tests described in Planned Comparisons for ANOVA and Unplanned Comparisons for ANOVA by treating the two-way ANOVA as a one-way ANOVA.

In particular, when the variances are not equal we can apply Welch’s correction for contrasts. We can also use the Scheirer-Ray-Hare test or Aligned Rank Transform (ART) ANOVA

Reference

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

169 thoughts on “Two Factor ANOVA with Replication”

  1. I have analyzed five different cultivar of apples for TPC, TFC, Yeild etc. Which is basically a comparative study. So i took 3 random samples of each cultivar and analyzed each sample 3 times( 3 replication). Now my question is that which way anova is to be applied. second, while putting the data in excel i have to use which of the bellow pattern
    1. sample1; rep1 rep2 rep3 OR
    2. sample 1; mean of 3 replicate

    Reply
  2. i have a set of data. i have calculated some soil properties of 7 types of tree areas. each type of tree are repeated 3 times so total plot is 21. from each plot i have to take 4 soil sample from 4 consecutive depths so total data for one aspect is 84. i want to know how to put the data in random block factorial design for anova calculation in excel please help.

    Reply
  3. Hi Charles,
    I have 4 groups of animals (below). Two groups are either normal (T+) or abnormal (T-) for a T gene. A subgroup of T- and T+ groups underwent control or surgery. We then measured the effect of presence/absence of T gene on surgery effect on some serum factor.
    I.e. We are assessing the effects of T gene and surgery as two independent factors on one dependent factor (serum factor).
    I am using two way ANOVA with replication for comparison and to find interaction between T gene and surgery. Am I correct?
    Also, can two way ANOVA be done on unequal Ns of groups?
    Thank you very much for any feedback
    Hesham

    Control Surgery
    T- 0.16 0.153
    0.17 0.156
    0.15 0.166
    0.15 0.145
    0.16 0.162
    0.162 0.155
    0.158 0.15
    T+ 0.15 0.164
    0.154 0.146
    0.161 0.166
    0.161 0.153
    0.149 0.156
    0.152 0.146
    0.152 0.15

    Reply
  4. Charles
    In the ANOVA with replication, I ahve seen the F values calculating in a couple of different ways.

    Fcd = MScd/MSinter
    Fmd = MSmd/MSinter
    Finter = MSinter/MSres
    or
    Fcd = MScd/MSres
    Fmd = MSmd/MSres
    Finter = MSinter/MSres

    The last calculation is the one used in Excel. Would you know when to use one calculation versus the the other?

    Reply
  5. Charles
    I am making surface measurements of a flat rectangular sheet. The measurements are made with something the looks like a CNC table where the head moves in one direction (x) across the sheet and takes measurement a fixed distance apart. The head then indexes in the Y direction and and scans back in the opposite direction taking a second row of measurements. I end up with a 2 dimensional array of data. In the normal case, I just do a 2-way ANOVA and get an indication of whether the variability in each direction is significant and if it is what the expected std dev is.

    Looking at how the measurement is made, it is possible the the measurements taking in one direction are slightly different than in the other I thought to treat neighbouring pairs of points as replicates. This seems to be useful identifying a systematic measurement error. A question is when trying to estimate variance, do I use a fixed, random or mixed formulation.
    Wish you the very best during this holiday season!

    Reply
  6. Hi thank you for this woderful website,
    Can you explain why in two way ANOVA with replication Sum on i sum on j of eijk is equal to zero

    Reply
  7. Near the end of this anova analysis you wrote the following:
    We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

    I suggest a slightly different version of formulating the conclusion:
    ….and so conclude that we cannot reject the null hypothesis with 95% confidence.

    It just says that we cannot state “we’re sure 95% that there is a difference between crops”

    Reply
  8. The two main effects are row and column position. I can collect data so that I have replicates. For some reason, I am having a hard time understanding the meaning of the interactive in a physical sense. If the interactive term is significant, what does that imply wrt what underlying data looks like? Appreciate you bearing with me on this.

    Reply
    • Ian,
      I understand that you are measuring something about plants grown in a rectangular grid. Since you are considering ANOVA with replication, I presume that you have a number of such grids (the replication. Suppose, for example, that each grid is 10 x 5 (and so each grid contains 50 small square plots) and that you are measuring the height of the plants. If you find a significant result for the Interaction factor, this would mean that there are significant differences in the height of the plants depending on their position (row x column) on the grid.
      Charles

      Reply
  9. Charles your pages have been very helpful. I am struggling to interpret the interaction term in a 2-way with replication ANOVA.

    For a hypothetical problem, picture a rectangular garden that is divided up into 10 rows and 5 columns. I plant 2 seeds in each square of the garden and at the end of the experiment, I measure the height of each of the plants. I am looking to see if position in the garden impacts the height of the plants.

    I am having a having trouble coming up with a physical explanation fot the interactive term when the interactive term is significant.
    Appreciate any coments you might have.

    Reply
    • Ian,
      I am assuming that your two factors are Row and Column and that you got a significant result for the interaction (Row x Column). It seems like you need to do some follow up analysis to see where the mean differences are are located. E.g. perhaps, the corners of the 10 x 5 plot is were the least (or most growth) is located.
      Charles

      Reply
      • Thanks Charles. Your assumptions are correct. I guess the interaction would become obvious in a graph like Figure 4.

        If we look at Figure 4, where there is a clear interaction, how do you describe the interaction in terms of the blocking that has been done (grains and fertilizers). Or can no statements be made?

        Here is a slightly different scenario. Rather than 10 x 5, if I specifically account for the position of the 2 plants in each cell resulting in a 20×5 grid (rather than a 10×5 grid with 2 plants in each cell). Should I treat this as a 20×5 2-way with no replication or a 10×5 with replication.
        I really appreciate the discussion on this.

        Reply
          • Charles
            I see my mistake Lets go back to
            “Rather than 10 x 5, if I specifically account for the position of the 2 plants in each cell resulting in a 20×5 grid (rather than a 10×5 grid with 2 plants in each cell). Should I treat this as a 20×5 2-way with no replication or a 10×5 with replication.”

            In this case if the interaction term was significant what would be the physical meaning?
            Take Care

  10. Dear Charles,

    i’m currently interpreting my thesis results entitled: Acceptability, physico-chemical and nutritional property of a mixed tropical fruit puree. In my study, i have 2 independent variables namely: formulation and pasteurization conditions. at the same time, i have several dependent variables: sensory evaluation results, physico-chemical and nutritional properties of my product. i had 2 sampling periods and during each sampling period, i gathered 50 respondents for the sensory evaluation test of my product and one bottle for each sample for the physico-chem and nutritional analysis. After gathering data, my friend told me to use 2-way anova with replication as my statistical tool, and i followed her advise. After calculating everything using MS excel, it turned out that some of findings had a significant difference. i was wondering, if i’m using the correct statistical tool? moreover, way back in my 3rd year, i was taught to conduct a post hoc test specifically DNMRT whenever findings/ results turned out to have a significant difference. now my other concern is: is it possible to use DNMRT as post hoc test after using 2-way anova with replication? if yes, can you provide me with the necessary steps to do the DNMRT? please, it will really help me lot cause i’m really struggling.

    Thank you for you for your time.

    Reply
  11. In my study , i have 2 independent variables namely: formulation and pasteurization conditions. at the same time, i have several dependent variables: sensory evaluation results, physico-chemical and nutritional properties of my product. i was told to use 2 way -anova with replication as my statistical tool. i followed her instruction and after calculating the results, my findings turned out to have a significant difference. I’m just curious and at the same time confused, should I proceed to Duncan’s multiple range test or not? and if ever i wish to proceed to conduct DNMRT, how should i do it using the 2-way anova with replication? can you provide me the necessary steps to conduct the DNMRT? please , it will really help me alot. thanks

    Reply
  12. Charles
    Great set of tutorials. Really appreciated browsing through the explanations and examples. I was doing a large 2-way no replication dataset (100, 600). When using Excel, the results (SS, MS and F) were different than what I am calculating manually. I am confident in the results that I have calculated and that round off errors were minimized. Have you seen this type of erro in Excel before?

    Reply
    • Ian,
      It sounds like either your manual calculations are incorrect or the Excel results are incorrect. I have not seen an error in this Excel calculation before, at least in versions of Excel after Excel 2007. Have you tried using the Real Statistics software to see whether you get the same results?
      Charles

      Reply
  13. Sir,
    Could you discuss the credibility of the interpretations and conclusions after using two way ANOVA? and Is there anything we should be concerned about? for example, the violation of normality assumption.

    Reply
      • Sir,
        Now I have a case needed to solve here:
        Suppose that a local chapter of sales professionals in the greater San Francisco area conducted a
        survey of its membership to study the relationship, if any, between the years of experience and
        salary for individuals employed in inside and outside sales positions. On the survey, respondents
        were asked to specify one of three levels of years of experience: low (1-10 years), medium (11-
        20 years), and high (21 or more years). The objective of this study is to test for any significant
        interaction between Position and Experience and to test for any significant differences in salary
        due to position and years of experience
        I wonder about the null hypotheses.
        There are 3 sets of hypotheses, are not there?
        H01: There is no differences in the mean salaries of sale person lying in different levels of years of experience.
        H02: There is no differences in the mean salaries of sale person lying in different levels of position.
        H03: There is no significant interaction between position and experience.

        Reply
  14. Dear Sir,
    I am confused by following statements below Figure 3:
    “Figure 3 – ANOVA analysis for Example 1

    We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor A null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

    Since the p-value (blends) = .00025 .05 = α, we can’t reject the Factor A null hypothesis,” I believe, instead of Factor A, it should be Factor B.
    Similarly, in sentence “Since the p-value (blends) = .00025 < .05 = α, we reject the Factor B null hypothesis" I believe, instead of Factor B, it should be Factor A.

    Whatever I believe is incorrect, please explain the conclusions.

    Thank You,
    Vijay Rathod

    Reply
    • Vijay,
      Thanks for bringing this error to my attention. I have just corrected the webpage by interchanging A with B.
      I appreciate your help in making the website more accurate.
      Charles

      Reply
      • Dear Sir,
        In my mail of July 25, 2017, last sentence should have begun with” If ”. Sentence should have been “If whatever I believe is incorrect, please explain the conclusions.” I am sorry for whatever inconvenience it may have caused. I was wondering whether the mail has become meaningless. It is heartening to see, you understood it as it was intended. I am learning statistics with the help of your website.
        Regards,
        Vijay Rathod.

        Reply
  15. sir above showed example two way ANOVA with replication of fertilizer vs crop is which type of model wheather it is LSD, split type model, …etc???.
    And can u suggest me the refference for above example

    Reply
    • Manoj,
      I created the example. It is not based on a real study. The numbers are made up by me. The purpose of the example is to show you how to perform two factor Anova with replication.
      Charles

      Reply
  16. hie
    i have an experiment with two factors and five levels and am told that it should be replicated four times. how can i go about it?

    Reply
  17. I AM MANOJ MEENA
    i have used Factorial RBD, with one factor at 3 level and factor two at five levels , total treatment 15 with 3 replication please send the analysis process in excel

    Reply
    • Manoj,
      This process can be accomplished using Excel’s Two Factor Anova with Replication data analysis tool as described on the website. You can also use Real Statistics Two Factor Anova data analysis tool.
      Charles

      Reply
  18. Hi Dr,
    Please inform what mean m & c in formula is SSA. What mean r in formula SSB.
    Please give this values using the example you used.
    Thanks

    Reply
  19. Dear Dr. Charles,

    Thank you so much to provide us so great sources here!

    I have some problems with my data analysis, could you please help me?

    There are 3 pathologists reviewed 80 slides via 3 different systems and the time-taken for each reviewing was recorded as seconds. The slides they reviewed are the same. In another word, each slide was reviewed by each pathologist via each system. For each slide, 9 results got. Now I need to know if there are any differences of time-taken between different systems. That means if any system takes significantly less time to complete a reviewing.

    I tried Two Factor ANOVA with Replication according to your above instructions, and got 3 p-values (for pathologists, systems and interaction) much less than 0.05. Now what I am wondering is as follows:
    1. Did I choose the appropriate analysis for my data?
    2. If I still need to know which two systems are different, what I should do further?
    3. How to explain the interaction? I plotted two line charts, but still don’t know how to interpreter them.

    Thank you very much!

    Best regards

    Susan

    Reply
  20. Hi Charles, this is a great teaching tool. I just switched to Excel from SPSS for teaching my stats classes because of your add-in, and so far its great. I have noticed a peculiar behavior in one of the factorial calculations and I was wondering if you prefer this kind of question posted here or sent privately?

    Reply
    • Andrew,
      Glad to see that you are using the Real Statistics add-in for teaching purposes. This was one of my goals when developing the software.
      Generally, it is best to ask questions here (as a comment). If you need to include a spreadsheet, you can send it via an email at the address shown on Contact Us.
      Charles

      Reply
  21. Hi Dr. Charles, I’m now trying to analyze my thesis results. My study is about the control of diseases of eggplant grown in open field and in greenhouse, which is my mainplot, the two types of cultivation. My subplot includes six treatments including the control, with four replications, arranged in RCB layout. I looked into similar theses with the same experimental design as mine. It is similar to your annova in fig 6 except that it has one more source of variation, the replication. I’m confused now which annova will I use. Will it be best to use split plot annova, or RCB layout annova? Or is my study a special case which needs a different analysis?

    Reply
  22. Hello Charles, Thank you for this great site.
    I have a question how its best to analyze my data (Anova) for a whole experiment instead of independent data sets. Below is and example of what my data may look like.

    Crop X Crop X
    Product application:Treatment 1 Product application: Treatment 2
    Plant # Leaf 1 Leaf 2 Leaf 3 Plant # Leaf 1 Leaf 2 Leaf 3
    1 70 85 50 7 65 75 60
    2 71 86 51 8 66 76 61
    3 72 87 52 9 67 77 62
    4 73 88 53 10 68 78 63
    5 74 89 54 11 69 79 64
    6 75 90 55 12 70 80 65

    The data of each leaf is taken at different time points, for example Leaf 1 data maybe taken at day 18 only and Leaf 2 at 27 days only because at the time of single application the leaves are at different developmental stages therefore they need time to grow. Also I can’t do a average of a single plant’s measurement over all the leaves because they can vary greatly between leaves of the single plant however not between the plants (has to do with developmental stages). I am currently comparing means (one way anova) Treatment 1 and Treatment 2 for Leaf 1, and for Leaf 2 and 3 to have three independent data sets. I would like to compare the experiment as a whole to see the affect on the plant as a whole however not sure what would be the best way to do that.

    Reply
    • Douglas,
      Sorry, but I don’t understand your scenario well enough to give any advice.
      Perhaps you can use Two Factor ANOVA or Split-plot ANOVA. Both are described on the website and are included in the Real Statistics software.
      Charles

      Reply
  23. Hi Charles.
    Great website and thanks for answering queries here.
    My question is whether or not this type of ANOVA would be appropriate for a randomised complete block trial?
    The standard for a RCBT seems to be very similar to your example above but also includes degrees of freedom in the replication.

    Thanks, Ash.

    Reply
    • Ash,
      The approach does indeed use a randomized complete block design taking sphercity into account.
      I didn’t understand your comment about “degrees of freedom in the replication”.
      Charles

      Reply
      • I have some raw data from a RCBD trial and have been asked to check the results of a third party who ran analysis on it.
        The trial had three replications which were run concurrently with each other. Testing two products at 4 different rates of application, to see if their effect was statistically different.
        Their method of analysis seems to have considered the degrees of freedom in replication, R.
        The table below shows the form that their results were presented in. I followed your method and did not consider degrees of freedom for R which yielded different results, notably DFerror = 14 below and 16 in your method.
        Am I applying an incorrect method?

        DF SS MS F P(F) LSD
        Total 23
        R 2
        A 1
        B 3
        AB 3
        ERROR 14

        Reply
        • Ash,
          I am not able to comment without additional information. If you send me an Excel file with your data and the results you obtained from R (please indicate which R capability you are using) and Excel, I will try to figure out what is going on. You can send this information to my email address listed at Contact Us.
          Charles

          Reply
  24. Dear Dr Charles
    I am studying the difference of X in 5 different nuclei of the brain (a1, a2, a3, a4, a5) in different time (control/pre/post). I have some animals of each group (3 controls, 3pre and 3 post). I know that I have to do a Two way ANOVA, but, If I do the same experiment in the same animal the measure is really different in almost all the nuclei and I don’t trust in doing the mean. So, I wonder if there is something I can do to avoid to do the mean.

    Thank you for your time.
    Estrella

    Reply
  25. Dear Charles,

    I’m preparing for my Business Statistics exam coming up next week, and one of the practice questions was:
    Explain why, when a test is being done to check whether there is a significant interaction between two treatments, replications are needed.

    I don’t really understand this question, because the way I see it, replications are the fact that we have more than one observation in each cell, and you can still check for significant interaction without replication occurring… Plus this question is only worth 4 points out of 50, so I don’t think expect a very detailed answer.
    Anyways, it would be very kind of you if you could help me out with this!
    Have a great day!

    Reply
    • Sorry Alex, but I don’t know which problem you are referring to nor what 5 step hypothesis you are referring to.
      Charles

      Reply
  26. Dear Dr Charles,

    I have a scenario where in which I have a spreadsheet with 8 columns, across these 8 columns are 7 independent variables including discrete variables (for example I have Sale Week “Yes/No”) and continuous variables (such as temperature which is unique for each week at each store). The last column is a “Sales” column which shows the total sales for a specific store (1 of 6) on a specific week (1 of 6). I am tasked with finding the factors that effect sales. Obviously there are multiple factors that could effect it (such as temperature… whether it is a sale week… whether it is the store size etc.) so I need to test this, although can I use multiple ANOVA tests? Would this be at risk to a type 1 error?

    Please let me know if you need more information regarding the actual dataset, I tried to summarise the data briefly. However I should note that I have been specifically asked to use ANOVA and/or t-tests to analyse the data.

    Regards,

    Chris

    Reply
      • I agree that regression would be more suitable, however, for the task I have been specifically asked to use ANOVA (or t-tests) to detect which factors affect sales.

        When I did a one-way ANOVA on temperature (I split the continuous data into low/med/high temperature), whilst there were significant differences on average sales between the groups of temperatures, it wouldn’t technically mean temperature had an effect on sales (because there are other independent factors in the data), would it? I’d have to find out if temperature had an interaction effect with another variable, but I’m not sure how to approach that?

        Thanks for your help.

        Chris

        Reply
        • Chris,

          You said that you have 8 columns, which I understood represents 7 independent variables and the dependent variable Sales. You seem to have data for these variables for different stores in different weeks.

          For argument sake, suppose you want to look at the interaction between the temperature (low/medium/high) and some other variable, say training level (high/low). Further suppose that your sample consists of 60 stores and for each of the 6 combination of temperature and training there were exactly 10 stores. You could use a two-way ANOVA model (with replication) with temperature and training factors to model the interaction between temperature and training.

          If you have data for 4 weeks, you can perform the above analysis for any of the four weeks or the average of the four weeks.

          If in the above scenario the number of elements in each of the interaction is not equal (10 x 6 = 60 in the above), you would need to use an unbalanced ANOVA model.

          I hope this helps you.

          Charles

          Reply
  27. Hi, could you please help me on the sum of squares part, I did the steps as you have above but I’m not getting the right answer for my question. Also could you please explain how to get the p-value

    Thanks

    Reply
    • Ashley,

      I suggest that first you make sure that you understand how to calculate the sum of squares and p-value in the one-way ANOVA case. The process is similar, but a little easier to understand. See
      One-way ANOVA

      You can also go to the Examples Workbook Part 2 to look at the formulas on the spreadsheets used in calculating the sum of squares and p-values. See
      Examples Workbooks

      Charles

      Reply
  28. Hi!
    I’m just starting to learn stats.
    I need to prove that resolution affects time. what method/test will I use? thanks!

    Reply
  29. Hi Charles,
    I am looking for some statistical assistance. I have three factors (NI, MOL & CO), Each factor contains 3 levels(2.5 WT%, 5 WT% AND 7.5 WT%). I am conducting experiments using L27 orthogonal array. What type of ANOVA I can use for finding the influence of each factor and also the influence of combination(NI*MOL, NI*CO, MOL*CO and NI*MOL*CO)
    Thank you so much!

    Reply
  30. Dear Dr Charles,
    I was greatly helped by the real stat,
    may I ask…
    based on Figure 4 – Interaction plots for Example 1;
    “From the first chart we can see that Brand X has a quite a different pattern from the other brands (especially regarding Soy). Although less dramatic, Brand Y is also different from Brand Z (especially since the line for Brand Y is trending up towards Soy, but trending down towards Rice, exactly the opposite of Brand Z).”

    Maybe you mean is “Blend” not “Brand”?
    and in these words: “Brand Y is trending up towards Soy, but trending down towards Rice”.
    It’s looks like mistyping to me. These: “trending down” become “trending up”, and vice versa.

    thank you

    Reply
    • Dear Jhon77,
      Thanks for catching these errors. I have just reworded the paragraph in error on the website. I really appreciate your finding these problems and your help in making the website better for the growing community of people who are using and depending on the site.
      Charles

      Reply
  31. Hi i’m working with one parameter with is protein content of 100 wheat genotypes cultivated in three growing seasons (season 1 (70 genotypes); season 2 (15 genotypes) and season 3 (15 genotypes)) ( with similar 14 genotypes between the 3 seasons) ; i did a combined analysis with genotype as fixed factor and crop year as random factor, results showed that genotype had the major impact , than G*CY interaction and finally crop year; what you think??

    Reply
  32. sir, i am an M.Sc Hons student, i analyzed my data while using two factorial design (two way ANOVA). i have 5 fertilizers, 5 species of sorghum and two replicates. one of my senior told me that i never use less than three replicates in two factorial design. sir please reply me what should i do?

    Reply
    • Shoaib,
      I don’t know of any such rule that you need at least 3 replicates. With such a small sample, the statistical power of your test will be very low.
      Charles

      Reply
  33. I am applying real stat add in on my data. I have two factors and two replications. One factor has four levels and other has two. When i apply two way anove, i get columns and rows but i did not have the interactions. Please help me out in this regard

    Reply
  34. Hi Charles,
    I am looking for some statistical assistance. I have three groups (2 gene knockdowns and 1 negative control),where the assumption is there is no difference among them. Each time I have run the experiment, 30 technical replicates have been used for each group. I have run the experiment three times, giving me three biological replicates. I am wondering whether I should run an ANOVA with a two-tailed post-hoc Dunnett test against the negative control with repeated measures or replication?
    Thank you so much!

    Reply
    • Elizabeth,
      If the three trials are based on the same 30 subjects per group, then it looks like you should use repeated measures (this will also be with replications). If the trials are on different subjects then depending on other details of the experiment you can simply run one-way ANOVA with 90 replicates per group.
      Charles

      Reply
      • Hi Charles,

        Thank you for your response. The trials for each are on a different set of 30 subjects per group. So I assume I should be using the one-way ANOVE with 90 replicates as you mentioned, but I am wondering whether that will give the data too much power and overestimate statistical significance?
        Thank you!

        Elizabeth

        Reply
  35. Please help will like you to exhaustively differentiation between Two way Anova with and without replication. Thanks in anticipation.

    Reply
    • Olarewaju,
      In Two-way ANOVA there are two factors, which I will call factors A and B. Suppose factor A has m levels (also called groups or treatments) and factor B ha n levels. Thus there are m x n combinations of levels from the two factors. These are the interactions between the two factors.
      In Two-way ANOVA without replication, the sample for each of the m x n consists of just one element.
      In Two-way ANOVA with replication, the sample for each of the m x n consists of two or more elements.
      Charles

      Reply
  36. sir,
    I am a student trying to complete my thesis. I am stuck with which method i should use. i have 4 different treatments: Treatment1(T1) : culm cutting in raised nursery bed
    T2: branch cutting in raised bed
    T3: Culm cutting in flat bed
    T4: branch cutting in flat bed
    Could you please recommend? Thank you

    Reply
    • It really depends on what you are trying to test. If you are trying to determine whether the treatments yield the same of different results then you can use one-way ANOVA with the 4 treatments listed. If you also want to study the interaction effects, then use two factor ANOVA where one factor is the cutting type (branch vs culm) and the other factor is bed type (raised vs. flat).
      Charles

      Reply
  37. Hi Dr Charles, can I ask,
    I write a small manual for students but I have trouble with this ANOVA particularly.
    Beacuse ist for students of education, I will present example from this science. Can you please tell me if I understand it right or wrong.

    (fictitious example) I have two measurements: A) Before course and after course in:
    knowledge of: A) social B) ontogenetic and C) clinical psychology.

    From ANOVA with replication I should find out:
    p1: differences in knowledge before and afrer course (rows)
    p2: differences in knowledge between subjetcs (columns)
    p3: interaction? This is the second part of my misapprehension.

    Is this example right for this test? What interaction tells me about this factors?

    Thank you very much!

    Reply
    • Rob,
      Based on my understanding of the scenario, you probably want to use ANOVA with repeated measures and not ANOVA with replication. It looks like a mixed model with factor A repeated measures and factor B not repeated measures. You can learn more about this type of model at the webpage
      Mixed Repeated Measures ANOVA.
      Charles

      Reply
  38. Dear Dr Charles

    I study if a new method estimates the same score than the old method. I have 60 participants that get tested with both methods at 5 time points. Can I use Two Factor ANOVA with replication to determine if the methods get different results?

    Thank you
    Best regards

    Reply
    • Rolf,
      No, you need to use ANOVA with repeated measures. In fact you need the mixed version of the test – one between and one within factor. This is described on the webpage

      Charles

      Reply
  39. Dear Dr Charles
    I am studying the effect of treatment, say X. I have three samples in each group and in each
    sample I obtained three readings before and after treatment X, my question is: Will ANOVA with replication will be the technique to seek answers by? if yes, then I should follow instructions in this page! below is data similar to what I have and mean.
    thanks

    no treatment
    smaple 1 sample 2 sample 3
    55 54 65 33 44 43 22 33 43

    with treatment
    smaple 1 sample 2 sample 3
    56 52 61 33 45 41 33 34 41

    Reply
    • Dear Mohammad,

      Based on the data that you have presented, I understand the following about Sample 1. Please let me know whether this is correct.

      There are three subjects in Sample 1. Subject 1 got a score of 55 before treatment and a score of 56 after treatment. Subject 2 got a score of 54 before treatment and a score of 52 after treatment. Subject 3 got a score of 65 before treatment and a score of 61 after treatment.

      If this is the correct way to interpret the data (and presumably the interpretations for the other two samples is similar), then a two factor ANOVA with replication is not the correct test. Instead you need a two factor mixed ANOVA where one of the factors is repeated measures. This is described on the following webpage:

      https://real-statistics.com/anova-repeated-measures/one-between-subjects-factor-and-one-within-subjects-factor/

      Charles

      Reply
  40. i am studying impact of emotional intelligence on teaching with the variables like sex ,education , age, and managemaent type
    so how can i use it for co-realating E.I to teaching

    Reply
  41. Hi, I want to investigate sex differences and education level on test anxiety among students. My variables are as follows :
    Independent variable 1 – Sex difference (male or female)
    Independent variable 2- education level (grade 2 or grade 3 students)
    Dependent variable – Test anxiety reported by the students.
    Is this suitable for a 2 way ANOVA? If yes, when putting in the data, should I input the score of each student on the Test Anxiety Inventory (TAI)? Or the sum of the students who reported test anxiety? I got confused on how I can key in each datum of upto 261 students that participated. Thanks.

    Reply
    • If text anxiety takes on a continuous set of values (or can be approximated by such values, e.g. 1, 2, 3, 4, 5, 6, 7), then this is indeed a 2 x 2 ANOVA, where you insert the test anxiety values in the four cells. Since 261 is not divisible by 4 this will be an unbalanced model. This can be solved by regression (see Unbalanced ANOVA).

      You insert the scores of each student in the cells. A possible source of confusion is with the chi-square test where you insert sums.

      Charles

      Reply
  42. i neva knew excel was that useful…i only used it in entering data and always said in mind that i will have to get a better software for my analysis….uv really made sense out of it for me i will not be underrating excel from now onwards….

    Reply
  43. I will be conducting feeding trials on village chickens using a locally formulated layer diet and commercial layer feed as the control. the trial is to compare egg production from the two diets. I intend to use ANOVA statistical analysis to analyse the data. there will be 4 replications. What is the appropriate design for such trials and what analysis method using would be the correct one?

    Reply
  44. Hi
    I have a trial comparing 7 fertiliser treatments on a Maize crop with 4 replicates (randomised block). I have 2 components to analyze. Total cob wt, and mean cob wt.

    Which anova programme do I use?

    Reply
    • Mike,
      If for example the replicates are say four varieties of maize, then you have a two fixed factor ANOVA with a fixed fertilizer factor and a fixed maize variety factor design. This is based on the fact that the mean cob wt can be calculated from the total cob wt. If not then you would have two dependent variables and so you would need to use two factor MANOVA (note that Real Statistics software only supports one factor MANOVA at present).

      Also this assumes that there are only 4 varieties of maize under study; if these 4 varieties were randomly chosen say from 100 possible varieties, then this would be a random factor and not a fixed factor.

      Charles

      Reply
      • Thanks for that.
        Then maize crop is the same variety across the trial.
        So what I have is, 7 fertilizer treatments replicated in 4 blocks.

        So a randomised block design with 7 fertiliser treatments.
        7*4=28plots, 7 plots per replicate
        So what programme do I use?
        cheers Mike

        Reply
        • Mike,
          If I understand correctly, you have 28 plots of land and 7 fixed fertilizer treatments. For each fertilizer you randomly select four plots of land and apply that fertilizer (each plot of land gets one fertilizer). If all you are interested in is the fertilizer, then you can simply use one-way ANOVA to compare the fertilizers. This assumes that all the plots of land are interchangeable (i.e. have similar characteristics). Please let me know whether this is your situation before I made any further suggestions.
          Charles

          Reply
          • Hi,
            Yes thats correct.
            The only thing I would add to that, is the replicates are in blocks, So its a ramdomised block design.
            So Block 1 has 7 fertilizers trts randomised (all rep 1). Then block 2 has the next 7 trts (rep 2)…etc., etc

            Does that make any difference ? Still use a one way anova?
            Mike

          • With randomized block design you should use two factor ANOVA without replication. This is precisely the data analysis tool supplied by Excel.
            Charles

  45. Hi,
    I was wondering if you could help. I’m looking to run a 2×2 anova … time (pre/post) x group (intervention/control) for 4 variables (various dependent variables. Is this considered a 2×2 anova , 2-factor w/out replication? i want to see if my variables changed between the group over pre/post-intervention. thank you!

    Reply
    • If you only had one dependent variable, then this sounds like 2 x 2 ANOVA with one fixed factor (group) and one repeated measures factor (time).
      If there is no (or little) correlation between the dependent variable, you can run four separate ANOVAs. The more leikely situation is that there is correlation, in which case you will likely want to use MANOVA.
      Charles

      Reply
  46. when m=1, “with replication” reduces to “without replication”. The SS_AB term becomes identical to SS_E, and SS_W goes to zero, as it should be.

    Conceptually, shouldn’t the interaction between A and B exists regardless the value of m? Why does one call SS_AB an interaction, with the SS_W as an error term, when m>1; but SS_E (reduced from SS_AB when m=1) is called an error term? Should the interaction between A and B be considered an error only when m=1?

    If the error term represents the “unexplained amount”, is the interaction term “explained” when m>1, but becomes “unexplained” when m=1?

    Thank you so much!

    Reply
    • Heather,
      The SS terms measure variability. In the without replication case, since there is only one data element in the intersection between A and B levels there is no variability and so SS_AB = 0. The error term is SS_W = SS_T – (SS_A + SS_B), which turns out to have the same formula as the SS_AB term in ANOVA with replication, but as I mentioned above since SS_W is not 0 it is not the SS of the interaction between A and B in the without replication case.
      Charles

      Reply
      • Thank you Charles. I was confused by the x_ij_bar notation in the SS_E term for the “w/o replication” case in the table under definition 2. It should be just x_ij, without the bar. It was obvious in the proof below.

        The Excel ANOVA table seems to label the terms arbitrarily, your website helps a lot in clarifying it. Thank you!

        Heather

        Reply
  47. Sir,

    In this example we have three independent factors (Blends X, Y and Z) and four dependent continuous variables (rice, soy, wheat, corn) which we analyze with ANOVA. Will analyzing this data with MANOVA make any sense? If yes, what will be the difference?

    Reply

Leave a Comment