Two Sample t Test: unequal variances

Objective

When the assumption of equal population variances is not met for the Two-Sample t-Test with Equal Variances (or when you don’t have enough evidence to know whether it holds) you should consider using a modified version of the t-test. This version is based on the following property.

Key Property

Property 1: Let x̄ and ȳ be the sample means and s_x and s_y be the sample standard deviations of two samples of size n_x and n_y respectively. If x and y are normally distributed, or n_x and n_y are sufficiently large for the Central Limit Theorem to hold, then the random variable

has a t distribution T(df) where the degrees of freedom is expressed as

The nearest integer to df is sometimes used.

An alternative version (Satterthwaite’s correction) of df (which has the same value) is calculated as follows

where

Welch’s t-Test

Property 1 can be used to test the difference between sample means even when the population variances are unknown and unequal. The resulting test is called Welch’s t-test. The degrees of freedom for this test will be smaller than (n_x – 1) + (n_y – 1), the degrees of freedom for the t-test where the variances are equal.

When n_x = n_y then the value of t in Property 1 is the same as in Property 1 of Two-Sample t-Test with Equal Variances. If, in addition, the variances are equal, then the df values are also the same, which means the p-values of the two tests are the same.

Worksheet Functions

Real Statistics Function: The Real Statistics Resource Pack provides the following function.

DF_POOLED(R1, R2) = degrees of freedom for the two-sample t-test with unequal variances for samples in ranges R1 and R2 (i.e. df in Property 1).

Excel Function: Excel provides the function T.TEST to handle the various two-sample t-tests.

T.TEST(R1, R2, tails, type) = the p-value of the t-test for the difference between the population means based on samples R1 and R2, where tails = 1 (one-tailed) or 2 (two-tailed) and type takes one of the following values:

the samples have paired values from the same population
the samples are from populations with the same variance
the samples are from populations with different variances

These three types correspond to the Excel data analysis tools

t-Test: Paired Two Sample for Mean
t-Test: Two-Sample Assuming Equal Variance
t-Test: Two-Sample Assuming Unequal Variance

Note that when type = 3 the T.TEST function uses the value of the degrees of freedom specified in Property 1 unrounded, while the associated Excel data analysis tool rounds this value down to the nearest integer. On this webpage, we explain how T.TEST is used when type = 2 or 3, while we describe the version where type = 1 in Paired Sample t Test.

The T.TEST function is not available in versions of Excel prior to Excel 2010. For these versions of Excel, the equivalent TTEST function is used instead.

The T.TEST and TTEST functions ignore all empty and non-numeric cells. Both tests assume that α = .05.

Example

Example 1: In Example 1 of Two-Sample t-Test with Equal Variances, we assumed that the population variances were equal since the sample variances were quite similar. We now repeat the analysis assuming that the variances are not necessarily equal.

We use the Excel formula T.TEST(A4:A14,B4:B14,2,3). The first two parameters represent the data for each sample (without labels). The 3^rd parameter indicates that we desire a two-tailed test. Finally, the 4^th parameter indicates that we are employing a t-test with two independent samples from populations whose variances are not assumed to be equal. Since

T.TEST(A4:A14,B4:B14,2,3) = 0.042642 < .05 = α

we reject the null hypothesis. Note that if we use type = 2, i.e. T.TEST(A4:A14,B4:B14, 2, 2) = 0.040219, the result won’t be very different, which is consistent with the fact that the sample variances are similar (and presumably so are the population variances).

Example 2: Repeat the analysis for Example 1 but with different data for the new flavoring as shown in Figure 1.

Figure 1 – Sample data and box plots for Example 2

Clearly, the sample variances are quite unequal. Using the T.TEST function with type = 3 we get

T.TEST(A4:A13 ,B4:B13, 2, 3) = 0.05773 > .05 = α

and so this time we cannot reject the null hypothesis (for the two-tailed test). Note that if we had used the test with equal variances, namely T.TEST(A4:A13, B4:B13, 2, 2) = 0.048747 < .05 = α, then we would have incorrectly rejected the null hypothesis.

Data Analysis Tools

We can also use Excel’s t-Test: Two-Sample Assuming Unequal Variances data analysis tool for Example 2. From Figure 2, we see that the results are the same.

Figure 2 – Data analysis for the data from Figure 1

Note that the p-value returned by T.TEST is slightly different from that reported by the data analysis tool. This is because the data analysis tool rounds the df to the nearest integer while T.TEST does not.

We can also use a Real Statistics data analysis tool to conduct this test or other versions of the t-test. Click here for details and examples.

Equal Variances Assumption

Generally, even if one variance is up to 3 or 4 times the other, the equal variance assumption will give good results, especially if the sample sizes are equal or almost equal. This rule of thumb is clearly violated in Example 2, and so we need to use the t-test with unequal population variances.

If the variances are equal then the equal and unequal variances versions of the t-test will yield similar results (even when the sample sizes are unequal), although the equal variances version will have slightly better statistical power.

Effect Size

The calculation of the effect size and the effect size confidence interval is the same as for the case where the two samples have equal variances. If the variances are very different, then it might be better to use the variance of one of the samples (e.g. the one representing the Control group) instead of the pooled variance. This version of Cohen’s d effect size is called Glass’ delta.

Cohen’s d* and Hedges’ g*

Another approach is to use Cohen’s d* which is defined by

where

We can now define the less biased Hedges’ version of this effect size, namely

where m = df*/2 and

Example

We can calculate d* and g* for Example 2 using the data in Figure 2 as shown in Figure 3.

Figure 3 – Cohen’s d* and Hedges’ g*

Interpretation

The default interpretation of Cohen’s d* effect size is

.20: small effect
.50: medium effect
.80: large effect

Confidence Intervals

Click here for a description of how to estimate confidence intervals for Cohen’s d* and Hedges’ g*.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Howell, D. C. (2010) Statistical methods for psychology (7^th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Microsoft Support (2022) T.TEST function
https://support.microsoft.com/en-us/office/t-test-function-d4e08ec3-c545-485f-962e-276f7cbed055

Delacre, M., Lakens, D., Ley, C., Liu, L., Leys, C. (2021) Why Hedges’ g*s based on the non-pooled standard deviation should be reported with Welch’s t-test
https://psyarxiv.com/tu6mp/download

295 thoughts on “Two Sample t Test: unequal variances”

Jennifer Berry

January 13, 2022 at 7:23 pm

In Example 2, how does the df value get calculated as 11?
Reply
- Charles
  
  January 14, 2022 at 9:44 am
  
  Hi Jennifer,
  The df is calculated as described on this webpage, but the Excel data analysis tool rounds off the df to the nearest integer. The Real Statistics version of this tool does not round off the df. See T-test analysis tool.
  Charles
  Reply
Tim

June 3, 2021 at 6:38 am

Charles, can you please order the below situations based on the expected power of the statistical test intended to identify the difference between two groups?

1) Datasets come from normal distributions with equal unknown variances
2) Datasets come from normal distributions with unequal known variances
3) Datasets come from two unknown distributions
4) Datasets come from normal distributions with unequal known variances and known unequal means
5) Datasets come from gamma distributions with unknown parameters

After everything I’ve read, I think it should be 1, 4, 2, 5, 3. Thoughts?
Reply
- Charles
  
  June 3, 2021 at 7:41 am
  
  Tim,
  I don’t quite understand how you compare these 5 dataset pairs. It is not clear to me what effect you are studying in each case: differences in the means? This wouldn’t even make sense for item #4 or for item #3 when the data come from a Cauchy distribution.
  Charles
  Reply
Mae

May 14, 2021 at 4:33 am

Hi sir
We are conducting a study and we are a little lost since we somehow already forgot about ttest
Our study uses ttest one tailed for unequal variance and we are comparing the binding affinity of 8 different phytochemicals to a positive control.
The values of the phytochems are: -8, -7.4, -5.4, -6.3, -6, -3.4, -5.9, -5.8
The value of the positive control is:
-6
I would like to ask whether how will we reject or accept the hypothesis and if its only normal that we got “NUM!” for: P(T<=t) one-tail, t Critical one-tail, P(T<=t) two-tail, and t Critical two-tail?
Reply
- Charles
  
  May 14, 2021 at 8:34 am
  
  Hi Mae,
  I see the values for sample 1, but don’t see the second sample. Are you comparing this sample with a hypothetical mean value of -6? If so, you should be using a one-sample t-test and not a two-sample t-test.
  Charles
  Reply
  - Mae
    
    May 18, 2021 at 9:32 am
    
    Hi uhmm actually the value of the second sample is the -6 we need to compare those values: -8, -7.4 and so on with that value and we are lost since we pretty much already forgot how to use ttest is there any way you could possibly help us :< btw we are only students so we arent really that good at research itself but if you can help us in any way pls do so thank youu :<
    Reply
    - Charles
      
      May 19, 2021 at 8:27 am
      
      Hi Mae,
      Are you saying that the second sample only contains one value, namely -6?
      If so, it is not possible to perform a two-sample test since the variance for the second sample is undefined.
      This is why I am suggesting that you perform a one-sample t-test where the hypothetical mean is -6.
      Charles
      Reply
Kumar AMJ

April 29, 2021 at 3:46 pm

Sir! I have the same sample for three tests RESULTS. I have used ANOVA to find out the significant variance between the three test results. then I used a t-test of unequal variance between A-B, A-C, B-C. but I don’t have a null hypothesis. is it okay if I use this kind of analysis and interpret my data, should I use Bonferroni correction or can I retain p=0.05 and interpret my data?
Reply
- Charles
  
  April 29, 2021 at 9:30 pm
  
  Kumar,
  When you say that “I don’t have a null hypothesis”, do you mean that none of the three tests generated a significant result?
  If you perform three post-hoc t-tests, you need to use a Bonferroni correction. It is better to use one of the post-hoc tests specifically designed to be used after a significant ANOVA. Tukey’s HSD is usually a good choice. It does not require the use of a Bonferroni correction.
  Charles
  Reply
Arundel

March 29, 2021 at 3:51 am

Hi, I am doing a research study on ply boards. I have three set-ups: Formulation 1, Formulation 2, and Standard. Formulations 1 and 2 are my experimental samples, while the Standard is my control. The test experiments I have to undergo are the water absorption test and strength test. My research questions are as follows:

1. Is there a significant difference in the water absorption test of plyboard made from Jackfruit peels (experimental) and commercial plyboard (control)?

2. Is there a significant difference in the strength test of plyboard made from Jackfruit peels (experimental) and commercial plyboard (control)?

I have used ONE WAY ANOVA to determine if there is a significant difference, and the stats show that it does. Now, I need to do a post hoc test as per my adviser but I do not know how to do it. I hope you can reach out and help me with this matter. Thank you in advance!
Reply
- Charles
  
  March 29, 2021 at 10:28 pm
  
  Hi Arundel,
  There are a number of post-hoc tests after one-way ANOVA. The most common approach is to use Tukey’s HSD. THis topic is covered at
  Unplanned ANOVA Post-hoc Tests
  Charles
  Reply
Tahlia

March 18, 2021 at 11:44 am

Hi Charles,
Can I use the two-sample t-test assuming unequal variance if my data has a couple of outliers? I have 32 pieces of data and unequal variances. I’m testing time and scores across two conditions so half of my data is discrete and half is continuous so if Welch’s test isn’t appropriate, is there any other statistical tests you could recommend?
Thanks,
-A confused student
Reply
- Charles
  
  March 19, 2021 at 12:04 pm
  
  Tahlia,
  If the normality assumption is met, then you can use the t-test. Of course, since you mentioned that you have some outliers, normality will be a problem (provided the outliers are really outliers). In these situations, you can usually use the Mann-Whitney non-parametric test.
  One further caveat needs to be mentioned. You say that you have a combination of discrete and continuous data. It is not clear why you want to perform a t-test (or similar test) on such data in the first place, but even so, this may also cause problems.
  Charles
  Reply
Dave

February 26, 2021 at 10:02 pm

Hi,

I have two independent samples with different means and difference variances. I want to run the t-test on the two means, but specifically I want to compute the probability of a Type II error at different alpha levels. Is it possible to input the means, sample sizes, variances or SDs along with various significance levels somewhere in your software and have it compute for me the Type II probabilities?
Thank you in advance.
Reply
Lee

February 24, 2021 at 3:31 am

Which value that we should use between P1 tail and P2tail ?
Reply
- Charles
  
  February 24, 2021 at 8:30 am
  
  Sorry, but I don’t understand your question.
  Charles
  Reply
SUZETTE N TURNER

February 22, 2021 at 8:05 pm

Hi Charles,

So I performed a T-Test assuming equal variance, as I wasn’t totally sure if the variances were equal, and got significant results. But I noticed the variance in one group was twice as high as the other, so I decided to try the unequal variance t-test, and got the same significant results. But I noticed something strange, in the equal variance test, my observations for each group were 320 and 313, respectively, with a df = 631. Then, in the unequal variance test, my observations changed to 196 and 314, respectively, an my df = 471. What would cause this difference? Does excel remove observations according to some sort of rule in the unequal variance test?

Thank you in advance,
Suzette
Reply
- Charles
  
  February 23, 2021 at 9:32 am
  
  Suzette,
  I don’t know why the number of observations would change. THis should no happen. It is not surprising that the df changes since this is the main difference between the equal variances and unequal variances version of the t-test.
  With such large samples, the equal variances t-test is pretty robust even when the variance of one sample is two or three times the variance of the other sample.
  Charles
  Reply
Ramya

February 17, 2021 at 2:45 am

Hi,
When I tested with unequal variances i got significant results but my t-stat value is 3.5 only (with considerable difference in two samples). Why I am getting results like that.

Thanks in advance,
Reply
- Charles
  
  February 17, 2021 at 9:40 am
  
  Ramya,
  Are you saying that your test results in a high t-statistic and a low p-value (for a significant result)? This just means that you have evidence that the means of the corresponding populations are likely to be different.
  Charles
  Reply

Objective

Key Property

Welch’s t-Test

Worksheet Functions

Example

Data Analysis Tools

Equal Variances Assumption

Effect Size

Cohen’s d* and Hedges’ g*

Example

Interpretation

Confidence Intervals

Examples Workbook

References

295 thoughts on “Two Sample t Test: unequal variances”

Leave a Comment Cancel reply