Equivalence Testing (TOST)

Objective

The objective of a two-sample equivalence test is to determine whether the means of two populations are equivalent based on two independent samples from these populations; here “equivalent” means that the two means differ by a small pre-defined amount. This margin of equivalence is determined by knowledge of the domain under study and represents the tolerance that is acceptable.

A two one-sided t-test (TOST) is used to make this determination. Essentially, TOST reverses the roles of the null and alternative hypotheses in a two-sided t-test.

Hypothesis Test

If θ represents the margin of equivalence, then we test the hypotheses:

H₀: μ₂ – μ₁≤ –θ or μ₂ – μ₁≥ θ

H₁: –θ < μ₂ – μ₁ < θ

This is done by conducting two one-sided t-tests, each of which is based on a null hypothesis that is one of the parts of the above null hypothesis. If the null hypothesis of both tests is rejected then the difference falls within the equivalence interval and you can claim that the two population means are equivalent. The larger p-value of the two t-tests is used as the p-value of the TOST.

Another way of looking at this method is to conduct a two-sided t-test and determine the 1–2α confidence interval I. If the confidence interval I lies completely within the interval (-θ, θ) then we accept that the two means are equivalent.

We can also use two different limits for the margin of equivalence, an upper-value θ_U and a lower-value θ_L. In this case, θ_U replaces θ in the TOST method described above and θ_L replaces –θ.

The TOST approach can also be used for a one-sample test (and similarly for a two dependent sample test). In this case, we test the equivalence between the mean of a single population and some hypothetical mean μ₀.

H₀: μ – μ₀≤ θ_L or μ – μ₀≥ θ_U

H₁: θ_L < μ – μ₀ < θ_U

This time we conduct two one-sample t-tests. Alternatively, if the 1–2α confidence interval for the two-tailed one-sample t-test lies completely within the interval (θ_L, θ_U) then we accept that the population mean and hypothetical mean are equivalent.

Example

Example 1: A company that markets a premium brand of fresh salmon has found a second supplier, but wants to make sure that the level of omega-3 is within 25 mg of their existing suppliers’ for a 100-gram serving. Using the random sample of 12 servings from each supplier as shown on the left side of Figure 1 determine whether the sources are equivalent based on alpha = .05.

The right side of Figure 1 shows the analysis for a two independent sample t-test. Since the 90% confidence interval (-9.17, 23.33) is completely contained in the interval (-25, 25), we conclude that the two sources are equivalent.

Note that if we test the two one-sided null hypotheses directly, we would obtain p-values of .036 and .0013. Since both are less than alpha = .05, we again conclude that the two sources are equivalent (with p-value = .036).

Figure 1 – TOST

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Lakens, D. (2017) Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502906/

Limentani, G. B., Ringo, M. C., Ye, F, Bergquist, M. L., McSorley, E. O. (2005) Beyond the t-test: Statistical equivalence testing. GlaxoSmithKline
https://kipdf.com/queue/-statistical-equivalence-testing_5ab5c06c1723dd329c642c9b.html

44 thoughts on “Equivalence Testing (TOST)”

Dimitrios

June 9, 2026 at 6:50 am

Hello Charles,

Thank you for this great tool!
I am unsure I’m following the documentation for this test right.

I need to do an equivalence test for two 7-replicate sets of samples (biodiversity metrics from two sites), at the 20% interval (this is to assess compliance with a resource consent, so the number of samples and interval limits are beyond my control).

If I want to follow the two one-sided t-tests’ approach, should the hypothesis mean difference be (±0.2 * (mean of reference site))? Then, if either of the two tests does not reject Ho, they are not equivalent (?)

And if I want to compare the confidence interval with the equivalence interval? Do I assess the lower and upper values against my (±0.2 * (mean of reference site)) outside of your TOST-spreadsheet’s analysis output? Or can I incorporate my 20% somehow?

Thank you very much for your help!
Reply
Steve

July 5, 2025 at 11:25 pm

Would this apply to non-normal datasets? Is there a non-parametric version?
Reply
- Charles
  
  July 7, 2025 at 1:29 pm
  
  Hi Steve,
  There is a non-parametric version using the Mann-Whitney test. See, for exxample,
  https://aaroncaldwell.us/TOSTERpkg/reference/wilcox_TOST.html
  Charles
  Reply
Bruno Mario Cesana

January 26, 2025 at 2:02 am

Dear Charles,
I think that this sentence “Note too that a two-sided t-test would yield a p-value = .46 and so we can also conclude that there is no significant difference between the two suppliers.”
it is not pertinent in an equivalence testing.
The fact that there is a not significant statistically difference between two groups does not mean that are equivalent. My suggestion is to clear it or to stress the difference between an equivalence testing and a standard test between two means.
Best
Bruno
Reply
- Charles
  
  February 26, 2025 at 9:22 am
  
  Dear Bruno,
  Thank you for your comment.
  The referenced statement was meant to be a side comment, but I agree that it is confusing and is not pertinent to the equivalence topic, and so I have removed it.
  Thanks again for catching this.
  Charles
  Reply
Sean Murphy

August 26, 2024 at 3:39 pm

Hi Charles,

I note that in your example test you calculate the t-statistic based on the absolute value of (M1-M2-Diff). Am I correct in thinking this was done so you could use the right handed t-dist function and consequently this approach becomes invalid if M1-M2 > Diff for the left handed test?

Best
Reply
- Charles
  
  September 3, 2024 at 9:19 pm
  
  Hi Sean,
  If I am looking at the same thing that you are, I take the absolute value so that I can use the two-tailed t-value since T.DIST.2T(x,df) is only valid when x >= 0.
  Charles
  Reply
Chris

August 24, 2023 at 2:57 am

Hi Charles,
Thanks for a great resource.
I am testing equivalence between two devices, how do I incorporate precision error into the TOST? My criterion method has a greater Least Significant Change than the preset boundary, or do I just have to make the boundary bigger than the LSC? Thank you.
Reply
- Charles
  
  August 24, 2023 at 10:23 am
  
  Hi Chris,
  Sorry, but I don’t know how to incorporate precision error into TOST.
  Charles
  Reply
  - Chris
    
    August 25, 2023 at 1:17 am
    
    Thanks Charles
    Reply
Pele

January 27, 2023 at 4:34 pm

Hey Charles,

I want to do equivalence testing with a dependent sample (pre-post-design).
I know Lakens (2017) suggests the Welch Test for independent samples, but which test can I use for dependent samples?

Thank you

Pele
Reply
- Charles
  
  January 31, 2023 at 4:03 pm
  
  Hello Pele,
  I am not familiar with equivalence testing for dependent variables. I will look into the paper that you referenced and try to get back to you in a few days.
  Charles
  Reply
Dave

June 15, 2022 at 4:44 pm

I’ve used your website many times and always find it very helpful. Thanks for your work in creating and maintaining this resource!

Your page on Equivalence Testing (TOST) does not contain your usual “Figure 2” showing the Excel formulas used to generate the output shown. Is it possible for that to be added, which would make applying the technique to my dataset easier?

Thanks again!
Reply
- Charles
  
  June 16, 2022 at 10:43 pm
  
  Hi Dave,
  The figure contains the output from the Real Statistics t-Test data analysis tool. I have added a link to the webpage that explains this tool.
  I have also added the ability to download the Excel spreadsheet for the example, and so you can see all the formulas used.
  Charles
  Reply
Isabel Anna Loader

August 18, 2021 at 3:25 pm

What would the hypothetical mean be in the one-sample t-tests? I have looked thoroughly and it is not discussed in the given example.
Reply
- Charles
  
  August 18, 2021 at 9:55 pm
  
  Are you asking about the one-sample t-test or about TOST?
  Charles
  Reply
  - Isabel Anna Loader
    
    August 19, 2021 at 4:27 am
    
    Dont worry I figured it out
    Reply
  - Isabel Anna Loader
    
    August 19, 2021 at 9:20 am
    
    Another question, I’ve run through your example successfully without much difficulty, but when I applied the methods to my data, the TOST indicated equivalence, while the two-sample two-sided t-test yielded a confidence interval way outside of the equivalence interval I decided on. I’m pretty sure the TOST is incorrect, as my two samples are pretty obviously not equivalent.
    
    Do you have any ideas as to what could’ve gone wrong? I could give you more specific info if it helps
    Reply
    - Charles
      
      August 19, 2021 at 11:13 am
      
      Hello Isabel,
      If you email me an Excel file with your data and results, I will try to figure out what the problem is.
      Charles
      Reply
Alan

January 18, 2021 at 8:28 am

Dear Charles,
is there any guide we can use to establish equivalence interval/margin, θ?
Thanks.
Reply
- Charles
  
  January 18, 2021 at 6:46 pm
  
  Alan,
  Perhaps the following paper will help.
  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502906/
  Charles
  Reply
  - Alan
    
    January 19, 2021 at 7:56 am
    
    thank you Charles.
    But do you have a simpler guide, for example I have seen an approach creating tolerance interval from a data set to compare against the specification limits in order to generate the largest acceptable difference or equivalence bound? is there any other approach we can use as a guide to form the said margin?
    Thank you.
    Reply
    - Charles
      
      January 19, 2021 at 8:45 am
      
      Alan,
      There are probably such guides, but I am not familiar with them. Sorry.
      Charles
      Reply
      - Alan
        
        January 19, 2021 at 9:01 am
        
        it’s ok Charles. You’ve been very helpful!
        Thank you so much!
      - LIWEI XIAO
        
        February 13, 2021 at 7:29 pm
        
        Fritz et al., 2012: Effect Size Estimates: Current Use, Calculations, and Interpretation
        in their paper they gave clear values including PS and U1
      - Charles
        
        February 14, 2021 at 11:27 am
        
        Thank you very much for sharing this link. It is a useful article.
        What is the connection between PS and U1 and TOST?
        Charles
Alan

January 7, 2021 at 10:17 am

hi
sorry I am confused that in Example 1 the alpha we set as 0.05 but in Figure 1 why alpha is displayed as 0.1?

thank you
Reply
- Charles
  
  January 7, 2021 at 7:56 pm
  
  Alan,
  As explained on the webpage, we are looking for the 1–2α confidence interval, which is why the .05 is doubled.
  Charles
  Reply
  - Alan
    
    January 8, 2021 at 2:44 am
    
    that means initially alpha is entered as 0.05 when we key in the the variable ranges, only when the test result is tabulated then we change Alpha to 0.1 to get the 90% confidence interval lower and upper limit?
    Reply
    - Charles
      
      January 8, 2021 at 8:12 am
      
      Yes, that is correct.
      Charles
      Reply
      - Alan
        
        January 11, 2021 at 10:34 am
        
        hi Charles,
        p values we get from one tail and 2 tails are to be compared against which alpha, 0.05 or 0.1?
      - Charles
        
        January 11, 2021 at 2:54 pm
        
        If you perform two one-tailed tests, you use alpha = .05, but if you use the confidence interval approach then use alpha = .1.
        This is explained on this webpage.
        Charles
Armando Garcia-Caceres

September 30, 2020 at 7:50 pm

Hi Charles,

Appreciate your article. I am getting acquainted with equivalence testing and your publication helped a lot by noting the different approaches using Excel. I wonder if you could shed some light on how to calculate Theta to run an equivalence test. I am using two softwares (QI Macros and MiniTab) and am struggling to figure out Theta. I am aware that it should be based on historical Standard Deviations or process stringency.
Reply
- Charles
  
  October 1, 2020 at 9:28 am
  
  Armando,
  This depends on why you want to use an equivalence test in the first place. The theta value should be chosen based on something that you are trying to test. If not, why use equivalence testing at all?
  Charles
  Reply
Fernando

September 4, 2020 at 10:51 pm

Hi, Charles. Thank you very much for all your work. I am trying to perform a TOST test to verify that two instruments measure correctly. I downloaded the Real Statistics Add In to Excel but I can’t find the TOST test. In the Real Statistics window I see these tabs: Desc, Reg, Anova, Time S, Multivar, Corr and Misc. In what tab should I find it? What item of the menu should I look for? Also, could you recommend a link with theory and practical examples regarding the TOST test, so that I can gain more insight and experience? Again, thank you very much for your generous help and support. Have a great day.
Reply
- Charles
  
  September 5, 2020 at 8:53 am
  
  Fernando,
  You only need to perform a t-test to get the TOST results. This is explained on this webpage.
  The t-test is found at the Misc tab.
  Charles
  Reply
Sonja Zaric

August 31, 2020 at 12:15 pm

Hi Charles,
thanks for your help. I’m trying to understand equivalence test approach better. Why can’t I leave hyp mean diff at 0?
If so, can you tell me how to choose hyp mean diff?
Reply
- Charles
  
  August 31, 2020 at 9:10 pm
  
  Sonja,
  You can set hyp mean equal to zero if you use the approach where you compare the confidence interval with the equivalence interval.
  If you prefer to perform two one-tailed tests, then you will need to change the value of hyp mean. This value is set first to the left end of the equivalence interval and then to the right end of the equivalence interval.
  Charles
  Reply
Sonja Zaric

August 27, 2020 at 11:48 am

Hi

You say above: Note that if we test the two one-sided null hypotheses directly, we would obtain p-values of .036 and .0013. Since both are less than alpha = .05, we again conclude that the two sources are equivalent (with p-value = .036).

How do you get .036 and .0013? Which test are you using in excel?

Thanks
Reply
- Charles
  
  August 27, 2020 at 3:38 pm
  
  Hello Sonja,
  If you insert the value 25 in cell H5 of Figure 1 you will see that cell H13 changes to .036.
  If you insert the value -25 in cell H5, cell H13 changes to .0013.
  Charles
  Reply
  - Sonja Zaric
    
    August 27, 2020 at 5:41 pm
    
    That’s perfect.
    I’m trying to do the equivalence test using the same test (T Test: Two Independent Samples) and I get following p values:
    One Tail: 7.02509E-05
    Two Tail: 0.000140502
    where hyp mean diff=0 and sig says yes.
    What is the conclusion there?
    Thanks,
    Sonja
    Reply
    - Charles
      
      August 28, 2020 at 7:29 am
      
      You need to specify the equivalence interval and then find the 10% confidence interval from the t-test results.
      Charles
      Reply
      - Sonja Zaric
        
        August 28, 2020 at 10:28 am
        
        What if I leave hyp mean diff at 0? Do p-values above indicate equivalence or there is not enough evidence to make that conclusion?
        Thanks
      - Charles
        
        August 28, 2020 at 11:04 am
        
        Sonja,
        No, this won’t work. You need to follow the approach described on the webpage.
        Charles