Correlation as t-test proofs

Point-biserial correlation

Property 1: If {y1, …, yn} is a sample for the dichotomous random variable y and {x1, …, xn} is a sample for the random variable x, the point-biserial correlation coefficient between these samples can be expressed by the formula

image203z

where m0 is the mean of the n0 data elements xi whose corresponding y value is yi = 0, m1 is the mean of the n1 data elements xi whose corresponding y value is yi = 1, and sx is the (sample) standard deviation of {x1, …, xn}.

Proof: First note that

Proof 1

When yi = 0, then yi – y-bar = – n1/n, and when yi = 1, then yi – y-bar = 1 – n1/n = – n0/n. Thus

Proof 2

Rearranging the terms so that the first n1 values of xi correspond to yi = 1, we have

Proof 3

Proof 4

Proof 5

Using these results, it follows that

Proof 6

The proof of the population version of this property is similar.

r-effect size

Property 2:

r/square formula

where t is the test statistic for two means hypothesis testing of variables x1 and x2 with t ~ T(df), x is a combination of x1 and x2, and y is the dichotomous variable as in Example 1.

Proof: By Property 1 of Correlation Testing via t Test and the fact that df = n–2, we see that

Proof 1

Squaring both sides of the equation and solving for  algebraically yields the desired result.

Proof 2

t2t2r2 = r2df

t2 = r2(t2+df)

Property 3: Assuming the two populations have the same variance

Cohen's d expressed via r

Proof: The first equality holds since by Property 1 of Two-sample t Test with Equal Variances

Proof 1

The last equality follows by By Property 1 of Correlation Testing via t Test since

Proof 2

References

Wikipedia (2012) Point-biserial correlation coefficient
https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

DATAtab (2024) Point-biserial correlation
https://datatab.net/tutorial/point-biserial-correlation

StatisticsLectures (2012) Effect size for dependent samples t-test
http://www.statisticslectures.com/topics/effectsizedependentsamplest/

Leave a Comment