Equivalence Testing (TOST)

Objective

The objective of a two-sample equivalence test is to determine whether the means of two populations are equivalent based on two independent samples from these populations; here “equivalent” means that the two means differ by a small pre-defined amount. This margin of equivalence is determined by knowledge of the domain under study and represents the tolerance that is acceptable.

A two one-sided t-test (TOST) is used to make this determination. Essentially, TOST reverses the roles of the null and alternative hypotheses in a two-sided t-test.

Hypothesis Test

If θ represents the margin of equivalence, then we test the hypotheses:

         H0: μ2 – μ1 ≤ –θ or μ2 – μ1 ≥ θ

         H1: –θ < μ2 – μ1 < θ

This is done by conducting two one-sided t-tests, each of which is based on a null hypothesis that is one of the parts of the above null hypothesis. If the null hypothesis of both tests is rejected then the difference falls within the equivalence interval and you can claim that the two population means are equivalent. The larger p-value of the two t-tests is used as the p-value of the TOST.

Another way of looking at this method is to conduct a two-sided t-test and determine the 1–2α confidence interval I. If the confidence interval I lies completely within the interval (-θ, θ) then we accept that the two means are equivalent.

We can also use two different limits for the margin of equivalence, an upper-value θU and a lower-value θL. In this case, θU replaces θ in the TOST method described above and θL replaces –θ.

The TOST approach can also be used for a one-sample test (and similarly for a two dependent sample test). In this case, we test the equivalence between the mean of a single population and some hypothetical mean μ0.

         H0: μ – μ0 θL or μ – μ0 ≥ θU

         H1: θL < μ – μ0 < θU

This time we conduct two one-sample t-tests. Alternatively, if the 1–2α confidence interval for the two-tailed one-sample t-test lies completely within the interval (θL, θU) then we accept that the population mean and hypothetical mean are equivalent.

Example

Example 1: A company that markets a premium brand of fresh salmon has found a second supplier, but wants to make sure that the level of omega-3 is within 25 mg of their existing suppliers’ for a 100-gram serving. Using the random sample of 12 servings from each supplier as shown on the left side of Figure 1 determine whether the sources are equivalent based on alpha = .05.

The right side of Figure 1 shows the analysis for a two independent sample t-test. Since the 90% confidence interval (-9.17, 23.33) is completely contained in the interval (-25, 25), we conclude that the two sources are equivalent.

Note that if we test the two one-sided null hypotheses directly, we would obtain p-values of .036 and .0013. Since both are less than alpha = .05, we again conclude that the two sources are equivalent (with p-value = .036).

Note too that a two-sided t-test would yield a p-value = .46 and so we can also conclude that there is no significant difference between the two suppliers.

TOST

Figure 1 – TOST

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Lakens, D. (2017) Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502906/

Limentani, G. B., Ringo, M. C., Ye, F, Bergquist, M. L., McSorley, E. O. (2005) Beyond the t-test: Statistical equivalence testing. GlaxoSmithKline
https://kipdf.com/queue/-statistical-equivalence-testing_5ab5c06c1723dd329c642c9b.html

39 thoughts on “Equivalence Testing (TOST)”

  1. Hi Charles,

    I note that in your example test you calculate the t-statistic based on the absolute value of (M1-M2-Diff). Am I correct in thinking this was done so you could use the right handed t-dist function and consequently this approach becomes invalid if M1-M2 > Diff for the left handed test?

    Best

    Reply
    • Hi Sean,
      If I am looking at the same thing that you are, I take the absolute value so that I can use the two-tailed t-value since T.DIST.2T(x,df) is only valid when x >= 0.
      Charles

      Reply
  2. Hi Charles,
    Thanks for a great resource.
    I am testing equivalence between two devices, how do I incorporate precision error into the TOST? My criterion method has a greater Least Significant Change than the preset boundary, or do I just have to make the boundary bigger than the LSC? Thank you.

    Reply
  3. Hey Charles,

    I want to do equivalence testing with a dependent sample (pre-post-design).
    I know Lakens (2017) suggests the Welch Test for independent samples, but which test can I use for dependent samples?

    Thank you

    Pele

    Reply
    • Hello Pele,
      I am not familiar with equivalence testing for dependent variables. I will look into the paper that you referenced and try to get back to you in a few days.
      Charles

      Reply
  4. I’ve used your website many times and always find it very helpful. Thanks for your work in creating and maintaining this resource!

    Your page on Equivalence Testing (TOST) does not contain your usual “Figure 2” showing the Excel formulas used to generate the output shown. Is it possible for that to be added, which would make applying the technique to my dataset easier?

    Thanks again!

    Reply
    • Hi Dave,
      The figure contains the output from the Real Statistics t-Test data analysis tool. I have added a link to the webpage that explains this tool.
      I have also added the ability to download the Excel spreadsheet for the example, and so you can see all the formulas used.
      Charles

      Reply
  5. What would the hypothetical mean be in the one-sample t-tests? I have looked thoroughly and it is not discussed in the given example.

    Reply
      • Another question, I’ve run through your example successfully without much difficulty, but when I applied the methods to my data, the TOST indicated equivalence, while the two-sample two-sided t-test yielded a confidence interval way outside of the equivalence interval I decided on. I’m pretty sure the TOST is incorrect, as my two samples are pretty obviously not equivalent.

        Do you have any ideas as to what could’ve gone wrong? I could give you more specific info if it helps

        Reply
  6. Hi Charles,

    Appreciate your article. I am getting acquainted with equivalence testing and your publication helped a lot by noting the different approaches using Excel. I wonder if you could shed some light on how to calculate Theta to run an equivalence test. I am using two softwares (QI Macros and MiniTab) and am struggling to figure out Theta. I am aware that it should be based on historical Standard Deviations or process stringency.

    Reply
    • Armando,
      This depends on why you want to use an equivalence test in the first place. The theta value should be chosen based on something that you are trying to test. If not, why use equivalence testing at all?
      Charles

      Reply
  7. Hi, Charles. Thank you very much for all your work. I am trying to perform a TOST test to verify that two instruments measure correctly. I downloaded the Real Statistics Add In to Excel but I can’t find the TOST test. In the Real Statistics window I see these tabs: Desc, Reg, Anova, Time S, Multivar, Corr and Misc. In what tab should I find it? What item of the menu should I look for? Also, could you recommend a link with theory and practical examples regarding the TOST test, so that I can gain more insight and experience? Again, thank you very much for your generous help and support. Have a great day.

    Reply
  8. Hi Charles,
    thanks for your help. I’m trying to understand equivalence test approach better. Why can’t I leave hyp mean diff at 0?
    If so, can you tell me how to choose hyp mean diff?

    Reply
    • Sonja,
      You can set hyp mean equal to zero if you use the approach where you compare the confidence interval with the equivalence interval.
      If you prefer to perform two one-tailed tests, then you will need to change the value of hyp mean. This value is set first to the left end of the equivalence interval and then to the right end of the equivalence interval.
      Charles

      Reply
  9. Hi

    You say above: Note that if we test the two one-sided null hypotheses directly, we would obtain p-values of .036 and .0013. Since both are less than alpha = .05, we again conclude that the two sources are equivalent (with p-value = .036).

    How do you get .036 and .0013? Which test are you using in excel?

    Thanks

    Reply
    • Hello Sonja,
      If you insert the value 25 in cell H5 of Figure 1 you will see that cell H13 changes to .036.
      If you insert the value -25 in cell H5, cell H13 changes to .0013.
      Charles

      Reply
      • That’s perfect.
        I’m trying to do the equivalence test using the same test (T Test: Two Independent Samples) and I get following p values:
        One Tail: 7.02509E-05
        Two Tail: 0.000140502
        where hyp mean diff=0 and sig says yes.
        What is the conclusion there?
        Thanks,
        Sonja

        Reply

Leave a Comment