Correlation as t-test proofs | Real Statistics Using Excel

Point-biserial correlation

Property 1: If {y₁, …, y_n} is a sample for the dichotomous random variable y and {x₁, …, x_n} is a sample for the random variable x, the point-biserial correlation coefficient between these samples can be expressed by the formula

where m₀ is the mean of the n₀ data elements x_i whose corresponding y value is y_i = 0, m₁ is the mean of the n₁ data elements x_i whose corresponding y value is y_i = 1, and s_x is the (sample) standard deviation of {x₁, …, x_n}.

Proof: First note that

When y_i = 0, then y_i – y-bar = – n₁/n, and when y_i = 1, then y_i – y-bar = 1 – n₁/n = – n₀/n. Thus

Rearranging the terms so that the first n₁ values of x_i correspond to y_i = 1, we have

Using these results, it follows that

The proof of the population version of this property is similar.

r-effect size

Property 2:

where t is the test statistic for two means hypothesis testing of variables x₁ and x₂ with t ~ T(df), x is a combination of x₁ and x₂, and y is the dichotomous variable as in Example 1.

Proof: By Property 1 of Correlation Testing via t Test and the fact that df = n–2, we see that

Squaring both sides of the equation and solving for algebraically yields the desired result.

t² – t²r² = r²df

t² = r²(t²+df)

Property 3: Assuming the two populations have the same variance

Proof: The first equality holds since by Property 1 of Two-sample t Test with Equal Variances

The last equality follows by By Property 1 of Correlation Testing via t Test since

References

Wikipedia (2012) Point-biserial correlation coefficient
https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

DATAtab (2024) Point-biserial correlation
https://datatab.net/tutorial/point-biserial-correlation

StatisticsLectures (2012) Effect size for dependent samples t-test
http://www.statisticslectures.com/topics/effectsizedependentsamplest/

Point-biserial correlation

r-effect size

References

Leave a Comment Cancel reply