Correlation in Relationship to t-test

Objective

On this webpage, we describe the following:

The relationship between the t-test and the correlation test
Point biserial correlation
The r-effect size (aka the eta or phi effect size)

Example

Example 2 of Two Sample t Test with Equal Variances tests whether memory recall is the same for the control group and the sleep-deprived group. The t-test is repeated in the middle part of Figure 1 based on the data for the two independent samples shown in ranges A4:A15 and A16:A27 of Figure 1.

This two-sample t-test can be turned into a correlation problem by combining the two samples into one sample (random valuable x) and setting the dichotomous random variable y to 0 for elements in one sample and to 1 for elements in the other sample. It turns out that the two-sample analysis using the t-test is equivalent to the analysis of the correlation coefficient using the t-test.

Example 1: Calculate the correlation coefficient r for x and y as described above, and then test the null hypothesis H₀: ρ = 0.

The correlation coefficient is calculated to be r = 0.0214 (cell L5) and the correlation test is shown on the right side of Figure 1 using the approach described in Correlation Testing using a t-Test.

Figure 1 – Using correlation testing to solve Example 1

The p-values (cells H14 and L10) and t-statistics (cells F14 and L7) of the t-test and correlation test are exactly the same. Using either test, we conclude that the hay fever drug did not offer any significant improvement in driving results as compared to the control.

Point-biserial Correlation

Definition 1: A variable is dichotomous if it only takes two values (usually set to 0 and 1).

The point-biserial correlation coefficient is simply Pearson’s product-moment correlation coefficient where one or both of the variables are dichotomous.

Property 1: If {y₁, …, y_n} is a sample for the dichotomous random variable y and {x₁, …, x_n} is a sample for the random variable x, the point-biserial correlation coefficient between these samples is given by the formula

where m₀ is the mean of the n₀ data elements x_i whose corresponding y value is y_i = 0, m₁ is the mean of the n₁ data elements x_i whose corresponding y value is y_i = 1, and s_x is the (sample) standard deviation of {x₁, …, x_n}.

If {x₁, …, x_n} and {y₁, …, y_n} are populations, then the point-biserial correlation coefficient is

where σ_x is the (population) standard deviation of {x₁, …, x_n}.

Example 2: Calculate the correlation coefficient for Example 1 using Property 1.

Figure 2 – Calculation of point biserial correlation coefficient

r-effect size

Property 2:

where t is the test statistic for two means hypothesis testing of variables x₁ and x₂ with t ~ T(df), x is a combination of x₁ and x₂, and y is the dichotomous variable as in Example 1.

Observation: The value for t from Example 1 is .1004. By Property 1,

and so r = .0214, which agrees with the value we get using CORREL (as we can see in cell L5 in Figure 1).

Observation: The effect size for the comparison of two means (see Two Sample t Test with Equal Variances) is given by

The sample version of this measure of effect size is

Using the formula from Property 1 of Correlation Testing via the t Test, we can convert this into an expression based on r, namely:

E.g., for the data in Example 1:

This means that the difference between the average memory recall score between the control group and the sleep-deprived group is only about 4.1% of the standard deviation. Note that this is the same effect size that was calculated in Example 2 of Two Sample t Test with Equal Variances.

Alternatively, we can use φ (phi) as a measure of effect size. Phi is nothing more than r. For this example φ = r = 0.0214. Since r² = 0.00046, we know that only 0.046% of the variation in the memory recall scores is based on the amount of sleep.

A rough estimate of effect size is that r = .5 represents a large effect size (explains 25% of the variance), r = .3 represents a medium effect size (explains 9% of the variance), and r = .1 represents a small effect size (explains 1% of the variance).

77 thoughts on “Correlation in Relationship to t-test”

Linda Thijssen

February 22, 2024 at 3:29 pm

Hi Charles
Could you give your opinion on this?
Let’s say I compared the distance run in 1 minute by men to distance run by women, and the results led me to reject the null-hypothesis. Would it be wrong to say that the gender correlates with the distance run? I used this phrasing in an essay, but my teacher disagrees, she says a t-test compares means and is not a correlation test.
Reply
- Charles
  
  February 24, 2024 at 9:54 am
  
  Hello Linda,
  Your teacher is correct that the usual two-sample t-test compares means, but as this webpage explains, this test is equivalent to one that tests whether a specific correlation is zero. For this latter test, it seems to me that “gender correlates with the distance run” is appropriate.
  Charles
  Reply
Mitchel

March 3, 2023 at 10:37 am

Hi, Charles!

The results of my paired sample t-test show that there is no significant difference between the two variables. Is it right not to proceed to the computation person-r given that the variables are not statistically significant?
Reply
- Charles
  
  March 3, 2023 at 11:38 am
  
  Mitchel,
  You can compute Pearson’s r, but the test for significance will yield the same result as the t-test.
  Charles
  Reply
Robert

February 25, 2023 at 9:52 pm

Hello, Charles,
I am conducting an exploratory study for a dissertation investigating the extent to which two different types of Master’s programs include particular leadership skills in their curricula. Inclusion is measured by a 4 – point survey of faculty members. I used a t-test and found some significant differences between program type and skill inclusion. I then used a point biserial correlation analysis to verify, confirm or perhaps show different significance. I did find some differences. Is it a sound method to use both t tests and correlations?
Reply
- Charles
  
  February 26, 2023 at 1:24 pm
  
  Hello Robert,
  As the webpage shows, the t-test and correlation test are equivalent, and so you should get the same result. If you are getting different results, then you are performing different tests from those described on this webpage.
  Charles
  Reply
Braden

November 29, 2022 at 5:16 pm

Hello Charles,

I am trying to determine the correlation between two data sets of 20 samples each. The data sets use two different simulation methods (bulk method and detailed method) to generate radioactivity in fuel pins (the same pins are compared in each method). The results are close (% difference ranging from 0.5 – 1.5%) however, the bulk value is always slightly lower than the detailed. I tried running a paired t-test, but I was unable to select the null hypothesis because of the way the data appears. I have been looking through the tests on this site and I am unable to find a suitable one for my case. If you could point me in the right direction I would greatly appreciate it.

Braden
Reply
- Charles
  
  November 29, 2022 at 6:40 pm
  
  Hi Braden,
  What hypothesis are you trying to test?
  Charles
  Reply
Rahman

June 26, 2022 at 1:48 pm

Thanks, Charles for this,

In my research, I tested whether there is a significant difference between two dependent variables by employing a paired sample t-test. The analysis shows a strong significant difference between these two variables. From this result, I intend to enhance my attribution of these findings by employing Pearson correlation. My question is, is that right, using correlation to enhance the findings of and interpretation of t-test?

Best
Reply
- Charles
  
  June 27, 2022 at 9:41 am
  
  Hello Rahman,
  What do you expect to gain by using correlation after the t-test?
  Charles
  Reply
elisa

April 19, 2022 at 3:53 pm

Hi Charles,
If my hypothesis is about women being less likely to take risks than men, should i use the point biserial correlation test or independent samples t test? given that gender is a binary variable and the likelihood of taking risk is a continuous variable.
Cheers.
Reply
- Charles
  
  April 20, 2022 at 7:58 am
  
  Elisa,
  As this webpage demonstrates, the correlation test is equivalent to the t-test. Since the point biserial correlation is equivalent to the correlation coefficient when one of the variables takes only two values (such as gender), the two tests that you have referenced are equivalent.
  Charles
  Reply
Juliet

December 7, 2020 at 2:57 pm

Hi Charles

I am hoping you might be able to help me please.

I am trying to get my head round using SPSS to analyse the data I have.
I have worked out the correlation coefficient for the set of data, but now I want to see if there is a difference in the correlation score when I separate the data by gender.
I thought I could use the Split Data option within the SPSS V26 and then get 2 correlation scores (male & female) and see if they are different.

I have been told that this is incorrect, please would you be able to help me understand which test I should run in order to compare the correlation (by gender)?

I look forward to hearing from you and thank you very much for your help.

Best wishes

Juliet
Reply
- Charles
  
  December 7, 2020 at 11:04 pm
  
  Juliet,
  I am sorry, but I don’t use SPSS and so I am unable to answer your question.
  Charles
  Reply
Me

November 14, 2020 at 9:21 am

Hi Charles, if I am studying the grades for two different semesters what type of t-test should I use, and can I proceed to correlation after that?
Reply
- Charles
  
  November 14, 2020 at 10:22 am
  
  Are they grades for the same students or different students?
  Charles
  Reply
Christine

November 12, 2020 at 4:51 pm

Hi, if my question is “the association between the studying time and a test result”, and I’m having two set of data (time and result) from a same sample group. how the t test helps me to interpret the result? Thanks!
Reply
- Charles
  
  November 13, 2020 at 3:08 pm
  
  Hi Christine,
  I presume that you are referring to a paired t test with null hypothesis that the mean of the difference between the studying time and test result is zero (i.e. the more you study the better the test results and the less you study the worse the test results. This is equivalent to a test of the correlation between studying time and test result.
  Charles
  Reply
Santie

November 10, 2020 at 7:12 am

Hi Charles.

Kindly help here. My research is on the stock exchange. My null hypothesis is using a Dividend discount model does not overvalued equity stock, the alternative hypothesis is it does over overvalue stock.

What statistical test would be best in testing the hypotheses?

I have the stock prices for all top 100 companies for 7 years each. I have also calculated their intrinsic value using the Discounted dividend model. I have means for both stock prices and the calculated prices.
Reply
- Charles
  
  November 11, 2020 at 11:23 am
  
  Hello Santie,
  If for each of the 100 companies, you had one value of stock price based on the dividend discount model and one stock value, then you could use the paired t-test.
  If you have daily prices for 7 years, then the test to use would depend on you specify the null hypothesis.
  Charles
  Reply
BharatRaj Khanal

July 26, 2020 at 4:08 pm

How to solve? Kindly please help me. Thank you

n=35
r=0.8838
Perform seven steps significant test at α 0.05 level .
Reply
- Charles
  
  July 27, 2020 at 5:55 pm
  
  See https://real-statistics.com/correlation/one-sample-hypothesis-testing-correlation/
  Charles
  Reply
BharatRaj Khanal

July 26, 2020 at 4:02 pm

Given :
n=35
r=0.8838
(Q)Perform seven step significant test at α 0.005 level .
Reply
- Charles
  
  July 27, 2020 at 5:55 pm
  
  See https://real-statistics.com/correlation/one-sample-hypothesis-testing-correlation/
  Charles
  Reply
Jeedu

June 16, 2020 at 8:36 am

Hello sir,

Could you please help me in identifying the appropriate inferential tests for the following research questions:
1. Is there a statistically significant difference between the means of student enrollment rates in colleges before and after the Great Recession?
2. Is there a significant difference between the proportions of the race/ethnic groups for students enrolling before and after the Great Recession?

I assume that I should use paired sample t test for the first question (college enrollment rates before and after the crisis, it’s like a pre- and post test).
For the second question, I am totally confused between Chi square test and Pearson correlation coefficient.

Thank you.
Reply
- Charles
  
  June 17, 2020 at 11:11 am
  
  Hello Jeedu,
  For the first question, the answer depends on the type of data that you have. If you have the enrollment rate figures for a number of colleges before and after the Great Recession then a paired t test may be appropriate.
  The answer is the same for the second question.
  Charles
  Reply
Michael Dizon

June 14, 2020 at 6:57 am

Good day sir, what kind of test should i use in my study? should it be a t-test or correlation?
my study is about:
Fiber Strength Evaluation of Two Plants According to their Location in in the plant.

i have 8 parameters:
plant height
girth of the plangs body(base, center, top)
number of leaves
fiber fresh weight
dry weight of fiber
fiber strength from the plant’s body (base, center, top)
Reply
- Charles
  
  June 14, 2020 at 12:50 pm
  
  Since you are probably comparing multiple parameters, neither test is likely to be appropriate.
  In order to answer your question, first I need to know what hypothesis or hypotheses you want to test. The term “fiber strength according to their Location in the plant” is too vague.
  Charles
  Reply
Shipu

May 9, 2020 at 3:48 pm

Hello Good Evening Sir,

This is the scenario of the question –

A psychologist developed a new
Anxiety Inventory of 25 items. The
Cronbach alpha for the inventory was
found to be 0.80. The inventory was
administered to a group of 50 anxiety
disorder patients before beginning their
psychotherapy. The patients had a
significantly higher mean than the
normative group of normals. The same
group of patients was retested after 12
weeks of psychotherapy. The post-test
mean was significantly lesser than the
pre-test mean. The new Anxiety Inventory
was correlated with an established anxiety
inventory and the correlation turned out to
be 0.20. The norms of the inventory were
developed in terms of normalized T scores.

My question is –
What would be the median T score
for the new Anxiety Inventory?
Reply
- Charles
  
  May 9, 2020 at 6:10 pm
  
  Hello Shipu,
  How do you define the normalized t score? Did to try to calculate this using your data?
  Charles
  Reply
  - Shipu
    
    May 10, 2020 at 11:49 am
    
    No sir, I didn’t understand that how I can calculate median T score through this given data. Can you explain me how I can solve this problem?
    Reply
    - Charles
      
      May 11, 2020 at 8:51 pm
      
      I don’t know what you mean by a “t-score”. Do you mean the t statistic?
      Charles
      Reply
Gabor

April 18, 2020 at 10:14 am

Hello,

I have a question regarding the topic. I have the variable A (scale) with which I create 3 groups (using the mean and Std., Value mean+Std. the highest one, and values inbetween the middle one).

I would like to see if the lowest group has a different score on variable B (which only is a 10 degree Likert-scale) than the highest group or not. Can I do this, or should I run a Spearman correlation analysis between the two variables? Or should I run both tests to emphasize the results?

Thanks in advance.
Reply
- Charles
  
  April 18, 2020 at 11:14 am
  
  Sorry Gabor, but I don’t understand the scenario that you are describing. You should be able to use a t test provided the data is normally distributed. Otherwise, you could use a Mann-Whitney test.
  Charles
  Reply
Mujittafa Sani

April 5, 2020 at 10:24 pm

Good evening Sir, I have a question that I want to ask. I am looking for a relationship between one independent variable with three levels and one dependent variable, which statistical tool am I going to use for this.
The hypothesis that I want to test is;
There is no significant relationship between the availability of laboratory equipment and academic performance.

The availability has three levels which are adequate, inadequate and not available.
I would be glad to hear from you sir 🙏
Reply
- Charles
  
  April 7, 2020 at 8:51 am
  
  It seems like a fit for one-way ANOVA, but it also depends on the nature of your data.
  Charles
  Reply
  - Mujittafa Sani
    
    April 7, 2020 at 3:21 pm
    
    Ok, thank you.
    In my case sir I have a sample of 27 schools, where I want to find out the relationship between availability of laboratory equipment and academic performance. Two instruments were used for data collection, one of the instrument was a questionnaire for the level of availability, and the other was a performance test that has a total of 25marks.
    The study found out that 4 schools have adequate equipments, 21 schools have inadequate equipments and there were no equipments at all in 2 schools. I calculated the mean test score of each school, for the availability I assigned 3 to school with adequate equipment, 2 to school with inadequate equipments and 1 to school without equipments.
    My x variable was the level of availability in each school, and my y variable was the mean score of the school. Example below
    X. Y
    3. 13
    2. 16
    3. 17
    1. 10
    2. 9
    
    After which ppmc was used to correlate the data. Sir is my analysis correct or is there any approach I can use apart from this one. Thank you
    Reply
    - Charles
      
      April 8, 2020 at 6:47 pm
      
      You can certainly calculate a correlation coefficient for X vs Y. How do you plan to use the result?
      Charles
      Reply
      - Mujittafa Sani
        
        April 9, 2020 at 2:40 pm
        
        The result is going to be used to conclude whether a significant relationship exists between the level of availability and academic performance of students. It is one of my dissertation research hypothesis that I want to test.
      - Charles
        
        April 9, 2020 at 6:54 pm
        
        In that case, you will want to test the correlation coefficient. See
        One Sample Testing of Correlation
faith

March 30, 2019 at 10:19 pm

pls help me out here.
Construct a test of four items.Assess the reliability of the test by administering it to four persons at different point or interval.compute the coefficient of correlation if the test.help out dnt know how to go about it
Reply
- Charles
  
  March 31, 2019 at 12:34 pm
  
  Hello Faith,
  I don’t understand the approach you are defining, but the following webpage describes many approaches to measuring reliability
  Reliability
  Charles
  Reply
Mel Mac

March 17, 2019 at 12:24 pm

Basing from the Example 2 where there were two groups (one taking the drug and the other one is not) converted to 0 and 1 in this Example 1, the groups were initially the IV for the t test statistics. But in this example, it is the y (or the DV). Will it matter if I retain the 0 and 1 as the x for correlation test, or it is required to use the dichotomous variable as the y? Thank you!
Reply
- Charles
  
  March 18, 2019 at 7:15 am
  
  Hi Mel,
  Since CORREL(A,B) is the same as CORREL(B,A) if I understand your question correctly, it won’t matter.
  Charles
  Reply
Winston

July 2, 2018 at 12:17 pm

An observation to check the math:

φ = r = 0.0214. Since r2 = 0.00046, we know that 0.46%. Should it be 0.046%?
I may not be understanding the derivation. Thanks for such a transparent teaching website.
Reply
- Charles
  
  July 2, 2018 at 12:25 pm
  
  Winston,
  Yes, the value should be 0.046%. Thanks for catching this typo. I have corrected the webpage to reflect the correct value.
  I appreciate your help in improving the accuracy of the Real Statistics website.
  Charles
  Reply
Jorge

March 5, 2018 at 9:48 pm

Dear Charles:

Is it OK if I use this test on Spearman correlation coefficients?

Thanks in advance,

Jorge
Reply
- Charles
  
  March 14, 2018 at 4:03 pm
  
  Jorge,
  I don’t understand your question. Which test do you want to use on Spearman’s correlation?
  Charles
  Reply
wajid

February 25, 2018 at 2:01 pm

Can we make correlation in one sample t test
Reply
- Charles
  
  February 26, 2018 at 1:56 pm
  
  Wajid,
  I don’t completely understand your question. Perhaps the following webpage addresses your issue:
  One Sample Hypothesis Test for Correlation
  Charles
  Reply
William A. Ahroon

January 30, 2018 at 11:55 pm

Can anyone tell me the formula for a t-test for the difference between two mean using the Pearson Product Moment Correlation Coefficient instead of calculating difference scores?
Reply
- Charles
  
  January 31, 2018 at 5:42 pm
  
  William,
  This is exactly what is shown in Example 1.
  Charles
  Reply
Pingback: Research XII) The correlation coefficent and the Chi-squared – Rambling things
Laura

August 17, 2017 at 3:14 pm

Dear Dr. Zaiontz,
I have two observer that had (independently) measured the area of cartilage of a group of patients. I want to compare these measures (that are normally distribuited) saying that the two can overlap. Can I run an unpaired t-test? or a correlation coefficient using t-test?

Thank you for creating this website, it’s very easy to understand and also to use,
Sincerely
Laura
Reply
- Charles
  
  August 17, 2017 at 4:42 pm
  
  Laura,
  Sorry, but I don’t understand what you mean by “the two can overlap”.
  What hypothesis do you want to test?
  Charles
  Reply
  - Laura
    
    August 18, 2017 at 7:24 am
    
    What I would like to say is that there is no statistical difference between measures made by observer 1 and those made by observer 2, so that the two groups of measures are very close to each other.
    
    Thank you so much,
    Laura
    Reply
shanta

April 5, 2017 at 7:27 pm

Hi,
Please advice how can I interpret the following:
when matched pair t-test shows insignificant difference but Pearson’s correlation gives significant difference. Please advice how to compare this result
Reply
- Charles
  
  April 6, 2017 at 6:36 pm
  
  Shanta,
  The matched paired t test is different from the two independent sample t test, but the latter is equivalent to a test of Pearson’s correlation (as explained on the referenced webpage).
  Charles
  Reply
Imcafee

September 26, 2016 at 8:54 pm

I am doing a study that looks at the relationship between father participation in school activities (0, 1) and change in their children’s test scores (spring to spring). If I use a t-test to calculate the difference of the means of the change in scores by group and I find that the difference is significant, should I also talk about the slope of the regression line or the correlation coefficient to get a sense of the nature of the relationship or R squared? How would I get that number?
Reply
- Charles
  
  September 27, 2016 at 10:59 am
  
  Imcafee,
  I am not sure what extra information you would get by doing this, but as the referenced webpage explains, you can turn a t test into a correlation by using a dummy dichotomous variable. Once you calculate the correlation coefficient in this way, R-square is just the square of the correlation coefficient.
  Charles
  Reply
Marinela Salas

March 6, 2016 at 6:24 am

r(x, y) t p
-0.78 -3.29 0.01
-0.28 -0.86 0.41
-1.00
pls help me in interpreting this result…correlation of two variables.thanks
Reply
- Charles
  
  March 6, 2016 at 10:07 am
  
  The first row calculates a sample correlation coefficient of -.78 and shows that the population correlation coefficient is significantly different from zero with 99% confidence
  The second row calculates a sample correlation coefficient of -.28 and cannot reject the null hypothesis that the population correlation coefficient is zero
  The third row calculates a sample correlation coefficient of -1, which means that the two samples are 100% negatively correlated
  Charles
  Reply
JOMS

February 27, 2016 at 2:55 am

r (Correlation) t comp Tabled t Comparison Decision
0.12 0.99 1.996 less than ?

How to interpret this table? Kindly please help me. Thank you
Reply
- Charles
  
  February 27, 2016 at 6:57 pm
  
  Sorry, but I don’t know what you are referring to.
  Charles
  Reply
Dhruv Pandya

February 16, 2016 at 10:25 am

I have a question.

I have two variables out of which one is continuous and the other is (artificially) dichotomous with an underlying property being continuous and normally distributed.
I want to find the correlation coefficient between these two variables. which will be better Point Biserial or Biserial coeff?
Reply
cayyum

January 25, 2016 at 11:42 am

May i ask a qn? Who is Charles Zaiontz?
Reply
- Charles
  
  January 25, 2016 at 2:25 pm
  
  See Author.
  Charles
  Reply
Charisa Candelario

November 14, 2015 at 2:53 am

Can I ask for help? Here is the data given aside from there means.
ΣX2 = Sum of square First Scores
ΣY2 = Sum of square Second Scores.
Reply
Charisa Candelario

November 14, 2015 at 2:51 am

Can I ask for help? Suppose I have the data for means of x and y then their summation of squared value only, how can i compute if they are significantly different at 5% level of significance. Thank you.
Reply
- Charles
  
  November 16, 2015 at 6:50 pm
  
  Charisa,
  When you say that you want to “compute if they are significantly different”, are you referring to the means of x and y or something related to the correlation between x and y (in which case, the usual test is whether the correlation is significantly different from zero)?
  Charles
  Reply
Pingback: Gibt es einen Zusammenhang zwischen Parteien/Koalitionen und Arbeitslosigkeit? | Der Burtchen
Christian Burtchen

October 15, 2015 at 2:31 pm

Thank you so much for this helpful explanation and the worksheet. However, in the downloadable worksheet, instead of ‘sig’ the cell value will be called ‘reject’, which to me sounds like the exact opposite. Am I missing something?

Kind regards and many thanks, Christian.
Reply
- Charles
  
  October 16, 2015 at 5:20 pm
  
  Christian,
  “Reject” in this context means “reject the null hypothesis,” which is equivalent to a significant result. Also, it seems that you are referring to an old version of the examples worksheet. The latest version uses “sig” instead of “reject”.
  Charles
  Reply
Usman

July 21, 2015 at 7:59 am

So, I can use any one I wish since they are the same
Reply
- Charles
  
  July 21, 2015 at 8:24 am
  
  Yes, you can use either one since they are equivalent tests.
  Charles
  Reply

Objective

Example

Point-biserial Correlation

r-effect size

77 thoughts on “Correlation in Relationship to t-test”

Leave a Comment Cancel reply