Wilcoxon Rank Sum Test – Advanced

Property 1

Suppose sample 1 has size n1 and rank sum R1 and sample 2 has size n2 and rank sum R2, then R1 R2 = n(n+1)/2 where n = n1 n2.

Proof: This is simply a consequence of the fact that the sum of the first n positive integers is \frac{n(n+1)}{2}. This can be proven by induction. For n = 1, we see that \frac{n(n+1)}{2} = \frac{1(1+1)}{2} = 1 = n. Assume the result is true for n, then for n + 1 we have,  1 + 2 + … + n + (n+1) = \frac{n(n+1)}{2} + (n + 1) = \frac{n(n+1)+2(n+1)}{2}\frac{(n+1)(n+2)}{2}

Property 2

When the two samples are sufficiently large (say of size > 10, although some say 20), then the W statistic is approximately normal N(μ, σ2) where

image945

Proof: We prove that the mean and variance of W = R1 are as described above. The normal approximation was proven in Mann & Whitney (1947) (see reference at the end of this webpage) and we won’t repeat the proof here.

Let xi = the rank of the ith data element in the smaller sample. Thus, under the assumption of the null hypothesis, by Property 1

image3534

By Property 4a of Expectation

image3535

As we did in the proof of Property 1, we can show by induction on n that

image3536

image3537

From these, it follows that

image3538

We can now calculate the following expectations:

image3539

Also where i ≠ j

image3540

image3541

By Property 2 of Expectation (case where i = j)

image3542

By Property 3 of Basic Concepts of Correlation when i ≠ j

image3543

By an extended version of Property 6 of Basic Concepts of Correlation

image3544

image3546

image3547Reference

Mann, H. & Whitney (1947) On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, p50-60.
http://projecteuclid.org/download/pdf_1/euclid.aoms/1177730491

26 thoughts on “Wilcoxon Rank Sum Test – Advanced”

    • Jack,
      You need to sum all the terms cov(x_i,x_j) where i not equal to j. Note that each such covariance is repeated twice, once for cov(x_i,x_j) and once for cov(x_j,x_i). Thus, if you assume that the sum is where i < j, then you need to double the result. Another way to look at this is to determine how many pairs there are for the indices 1 to n1 where the indices are not equal. The answer is n1(n1-1), which is the value used in the proof. This is the same as 2 times n1(n-1)/2, the later being the number of pairs where the first index is less than the second index. To make this much clearer and more accurate, I have now replaced the lower limit of the summation symbol by i < j (instead of i not equal to j). Thanks for bringing this issue to my attention. Charles

      Reply
      • Mr Charles, may be you not believe but it is true that until today I know you replay me with message. And my written english is not good.It is difficult to get in touch with world web.In 2017,after i ask you the question,the next day i found you have changed i!=j to i<y.But stupid web do not let me see you replay message.Watch your proof spend lot of time,especially 2Σ i!=j to n1 [-(n+1)/12]. now i see your message.but some words i don not get understand,I forget many details about proof. So I just want to know Σ i!=j to n1 [-(n+1)/12] is right ? And Σ i!=j to n1 [-(n+1)/12] is equal 2Σ i<j to n1 [-(n+1)/12] ?

        Reply
  1. I have one question about this proof. You calculate the expectation E(rirj) for all j not equal i. I don’t understand why we could take this expectation as equivalent to E(rirj) for all j, i. In the covariance we have to use E(rirj) of all rangs, but you use the expectation for all j not equal i, why is it correct? Can you explain me this problem?

    Thanks for your answer!

    Reply

Leave a Comment