Mann-Whitney Test for Independent Samples

Basic Concepts

The Mann-Whitney U test is essentially an alternative form of the Wilcoxon Rank-Sum test for independent samples and is completely equivalent.

Define the following test statistics for samples 1 and 2 where n1 is the size of sample 1 and n2 is the size of sample 2, and R1 is the adjusted rank-sum for sample 1 and R2 is the adjusted rank-sum of sample 2. It doesn’t matter which sample is bigger.

image950

image951

As for the Wilcoxon version of the test, if the observed value of U is < Ucrit then the test is significant (at the α level), i.e. we reject the null hypothesis. The values of Ucrit for α = .05 (two-tailed) are given in the Mann-Whitney Tables.

Example using the table of critical values

Example 1: Repeat Example 1 of the Wilcoxon Rank Sum Test using the Mann-Whitney U test.

Mann-Whitney test Excel

Figure 1 – Mann-Whitney U Test

Since R1 = 117.5 and R2 = 158.5, we can calculate U1 and U2 to get U = 39.5. Next, we look up in the Mann-Whitney Tables for n1 = 12 and n2 = 11 to get Ucrit = 33. Since 33 < 39.5, we cannot reject the null hypothesis at α = .05 level of significance.

Properties

Property 1:
image958

Property 2: For n1 and n2 large enough the U statistic is approximately normal N(μ, σ2) where

image959

Observation: Click here for proofs of Properties 1 and 2.

Property 3: Where there are a number of ties, the following revised version of the variance gives better results:

where n = n1 + n2, t varies over the set of tied ranks and ft is the number of times (i.e. frequency) the rank t appears. An equivalent formula is

image004xContinuity Correction

A further complication is that it is often desirable to account for the fact that we are approximating a discrete distribution via a continuous one by applying a continuity correction. This is done by using a z-score of

instead of the same formula without the .5 continuity correction factor.

Example using a normal approximation

Example 2: Repeat Example 2 of the Wilcoxon Rank Sum Test using the Mann-Whitney U test.

Figure 2 shows the results of the one-tailed test (without using a ties correction). Column W displays the formulas used in column T.

Mann-Whitney U Test

Figure 2 – Mann-Whitney U test using normal approximation

As can be seen in cell T19, the p-value for the one-tail test is the same as that found in Wilcoxon Example 2 using the Wilcoxon rank-sum test. Once again we reject the null hypothesis and conclude that non-smokers live significantly longer.

Effect Size

The effect size for the data using the Mann-Whitney test can be calculated in the same manner as for the Wilcoxon rank-sum test, namely

Mann-Whitney effect size

and the result will be the same, which for Example 2 is r = .31, as shown in cell T21 of Figure 2. This is a medium-sized effect.

Here, .1 is a small effect, .3 is a medium effect, and .5 is considered to be a large effect.

Rank-biserial Correlation

There is another measure of effect size, namely

simple effect size p

p represents the probability that a score randomly generated from population A will be bigger than a score randomly generated from population B, where A and B are the populations corresponding to the two samples and A corresponds to the sample with the higher value. The higher this value is the larger the effect.

The difference between p and its complement 1–p provides a standard effect size r called the rank-serial correlation. Thus

Rank-serial correlation effect size

where μ is the mean of the normal approximation. For Example 2, the rank-serial correlation is r = 1 – 486/760 = .36, which is a medium-sized effect.

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack provides the following functions:

MANN(R1, R2) = U for the samples contained in R1 and R2

MANN(R1, n) = U for the sample contained in the first n columns of R1 and the sample consisting of the remaining columns in R1. If the second argument is omitted it defaults to 1.

MWTEST(R1, R2, tails, ties, cont) = p-value of the Mann-Whitney U test for the samples contained in R1 and R2 using the normal approximation. tails = 1 or 2 (default). If ties = TRUE (default) the ties correction factor is applied. If cont = TRUE (default) a continuity correction is applied.

Any empty or non-numeric cells in R1 or R2 are ignored.

Observations

For Example 2, we can use the Real Statistics MANN function to arrive at the value of 486 for U shown in cell T9 of Figure 2, namely =MANN(J6:M15,N6:Q15) = 486. Similarly, the p-value of 0.003081 in cell T19 can be calculated by =MWTEST(J6:M15,N6:Q15,1,FALSE,TRUE).

Note that the z-score and the effect size r can be calculated using the Real Statistics function MWTEST as follows:

z-score = NORM.S.INV(MWTEST(R1, R2))

r = NORM.S.INV(MWTEST(R1, R2))/SQRT(COUNT(R1)+COUNT(R2))

The results of analysis for Example 2 can be summarized as follows: The life expectancy of non-smokers (Mdn = 76.5) is significantly higher than that of smokers (Mdn = 70.5), U = 486, z = -2.74, p = .0038 < .05, r = .31, based on a one-tailed test Mann-Whitney test with continuity correction, but no correction for ties.

Of course, you can also use a two-tailed test with a ties correction, as we demonstrate shortly.

Worksheet Array Function

Real Statistics Function: The Real Statistics Resource Pack provides the following function that returns output consisting of the U-stat, z-stat, r effect size, and the three types of p-values (the normal approximation, exact test, and simulation).

MW_TEST(R1, R2, lab, tails, ties, cont, exact, iter): returns a column array with the output described above for the samples contained in R1 and R2. tails = 1 or 2 (default). For the normal approximation, if ties = TRUE (default) the ties correction factor is applied; if cont = TRUE (default) a continuity correction is applied; when exact = TRUE (default FALSE) then the p-value of the exact test is output and if iter ≠ 0 then the p-value of the simulation version of the test is output where the simulation consists of iter samples (default 10,000). If lab = TRUE (default FALSE) then an extra column of labels is appended to the output.

Any empty or non-numeric elements in R1 or R2 are ignored. See Mann-Whitney Exact Test and Mann-Whitney Simulation for more information about the exact test and simulation p-values.

Figure 3 displays the output from =MW_TEST(A6:A17,B6:B17,TRUE) for Example 2.

Output from MW_TEST

Figure 3 – Output from MW_TEST

Even if the argument is set to FALSE, the p-value of the exact test will be produced provided both samples have fewer than 800 elements and the smaller sample has at most 300 elements.

Data Analysis Tool

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack also provides a data analysis tool that performs the Mann-Whitney test for independent samples, automatically calculating the medians, rank sums, U test statistic, z-score, p-values, and effect size r.

For example, to perform the analysis in Example 1, press Ctrl-m and choose the T Test and Non-parametric Equivalents data analysis tool from the menu that appears (or from the Misc tab if using the Multipage user interface). The dialog box shown in Figure 4 now appears.

Dialog box Mann-Whitney

Figure 4 – Dialog box for Real Statistics Mann-Whitney Test

Enter A5:B17 as the Input Range 1 (alternatively, insert A5:A17 in Input Range 1 and B5:B17 in Input Range 2), click on Column headings included with data, choose the Two independent samples and Non-parametric options, and click on the OK button. Keep the default of 0 for Hypothetical Mean/Median and .05 for Alpha (although these values are not used) For this version of the test, we check the Use continuity correction, Include exact test, and Include table lookup options, but leave the Use ties correction option unchecked.

The output is shown in Figure 5.

Mann-Whitney analysis output

Figure 5 – Mann-Whitney test data analysis tool output

Note that both the one-tail and two-tail tests are displayed. Also, three versions of the test are shown: the test using the normal approximation (range E17:F17), the test using the exact test (range E18:F18), and the simulation test (range E19:F19). The fact that the “Yates” continuity correction factor is used is noted in cell F15.

Ties correction version

If we check the Use Ties correction option in Figure 4 we would obtain the output shown in Figure 6.

Mann-Whitney with ties

Figure 6 – Mann-Whitney test with ties correction

In this case, the ties correction of Property 3 is applied to the normal approximation. As you can see there is very little difference between the outputs shown in Figures 5 and 6.

Note too that the ties correction (as well as the continuity correction) only applies to the normal approximation. The ties and continuity corrections are not applied to the exact and simulation versions of the test. The difference in the simulation p-values (row 19) in Figures 5 and 6 is due to the randomness of the simulations and not the ties correction.

Rank-serial correlation version

If the Use rank-serial correlation option were checked in Figure 4, then cell K16 in Figure 6 would contain the formula =1-K12/K13 with value .4015. Cell L16 would contain the value “rank-serial”.

Ties correction worksheet function

Real Statistics Function: The Real Statistics Pack provides the following function to calculate the ties correction used in the data analysis tool.

TiesCorrection(R1, R2, type) = ties correction value for the data in R1 and optionally R2, where type = 0: one sample, type = 1: paired sample and type = 2: independent samples

For the Mann-Whitney test type = 2.  The ties correction is used in the calculation of the standard deviation (cell U15 of Figure 6) as follows

=SQRT(K13*(K6+L6+1)/6*(1-TiesCorrection(A6:A17,B6:B17,2)/ ((K6+L6)^3-K6-L6)))

Exact Test

Click here for a description of the exact version of the Mann-Whitney Test using the permutation function.

Simulation

Click here for a description of how to use simulation to determine the p-value for the Mann-Whitney test. This approach takes ties into account.

Confidence Interval of the Median

Click here for a description of how to calculate a confidence interval of the median based on the Mann-Whitney Test.

Statistical Power and Sample Size

Click here for a description of how to calculate the statistical power or minimum sample size required for the Mann-Whitney Test.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf

Kerby, D. S. (2014). The simple difference formula: an approach to teaching nonparametric correlation. Comprehensive Psychology.
https://scirp.org/(S(351jmbntv-nsjt1aadkposzje))/reference/referencespapers.aspx?referenceid=3598009

138 thoughts on “Mann-Whitney Test for Independent Samples”

  1. Hello Charles,
    First a big thank you for this fabolous web-site.
    I am using Excel for non-parametric statistics, like on this site, BUT I think I have found a ‘bug’ in Excel. When I try to rank the numbers, fx -0.2 and -0.2 is not always equal. I found out that Excel is not always correct in calculating; the numbers differ on decimal nr. 15:
    Excample:
    42,1-43,3=-0,199999999999996
    51,3-51,5=-0,200000000000003

    Do know any way to solve this problem
    Best regards Sander

    Reply
    • Hello Sander,
      On my computer
      42,1-43,3=-0,200000000000000
      51,3-51,5=-0,200000000000000
      (although the comma is replaced by a period)
      In any case, I understand the problem regarding Excel’s ranking function. For this reason, Real Statistics contains the following worksheet function to deal with this problem:
      RANK_AVG(x, R1, order, num_digits): returns the same value as the Excel formula =RANK.AVG(x, R1, order) where x and all the values in R1 are first rounded off to num_digits number of decimal places. If num_digits is omitted then 8 decimal places are used.
      Charles

      Reply
  2. Hello Charles,

    My students in Marseilles appreciate your macro Real statistics especially the ability to perform non parametric tests.
    My students are puzzled with the fact that the macro does not deliver critical values for Mann Whitney test. Would you be so kind as to add critical value from the table.
    Cheers

    Reply
  3. Hello Charles,
    thank you for your very helpful article. I’ d like to ask you a question.
    Can I use the Mann-Whitney U test if the sizes of the samples are very imbalanced (e.g. n1=24, n2=4)?
    Thanks again,
    Thanasis

    Reply
  4. Hello, Charles.
    Today I tested Real Statistics Data Analysis Tool, release 8.8, T Test and non parametric equivalents, selecting they option non parametric (for obtaining Mann Whitney test results), choosing continuity correction, including exact test and including simulation (10000 iterations). I got an error related to a matrix calculation for p-value simulation (simulation was cut and p-value for simulation was blank). I run Excel 365, Spanish version.
    That error is not present when I use an earlier version of Real Statistics (downloaded two years ago, on July 2021): with that version I can obtain values very similar to those obtained by normal p-value and exact p-value.
    In suposse there’s a little change in the way matrix calculations are used for obtaining p-value by simulation.

    I will be waiting for your comments.

    Thanks you.

    William Agurto.

    Reply
    • Hello William,
      I don’t think that I have made any changes to the Mann-Whitney test in the past two years. But as I am sure you know, sometimes some other change has impacted this capability. In any case, can you email me the data that you used so that I can use it to test the Mann Whitney tool?
      Charles

      Reply
      • Hello, Charles.
        Thank you for your answer.
        Here’s my data (two independent samples):
        Nivel Autoestima, Mujeres Nivel de Autoestima, Varones
        34 52
        32 52
        35 45
        41 38
        38 45
        29 24
        43 38
        26 46
        52 48
        41 45
        44 31
        47 40
        33 52
        53 39
        32 38
        41 54
        38 35
        32 48
        38 44
        57 53
        35 34
        29 48
        31 49
        56 43
        36 54
        42 53
        47 34
        28 50
        37 53
        44 48
        42 47
        41 57
        50 52
        44 44
        39 54
        30 43
        37 50
        55 56
        56 54
        29 45
        51 54
        38 32
        51 49
        40 57
        40 44
        33 52
        43
        38
        43
        56
        47
        43
        53
        As I told you, the problem is present only in the last release (8.8) of Real Statistics. It’s not present in an earlier version (July 8th, 2021). I don’t know if the bug is present in other versions: I detected that in release 8.8.
        The text of the error in Excel 365 (Spanish version) is:
        “A runtime error has occurred. The analysis tool will be aborted.
        No se puede asignar la propiedad FormulaArray de la clase Range”.

        Thank you.

        William Agurto.

        Reply
        • Hello, Charles:
          For clarity, I resent the data (in the prior message it’s not clear which column has more data):

          Nivel Autoestima, Mujeres (n1=46):
          34
          32
          35
          41
          38
          29
          43
          26
          52
          41
          44
          47
          33
          53
          32
          41
          38
          32
          38
          57
          35
          29
          31
          56
          36
          42
          47
          28
          37
          44
          42
          41
          50
          44
          39
          30
          37
          55
          56
          29
          51
          38
          51
          40
          40
          33

          Nivel de Autoestima, Varones (n2=53):
          52
          52
          45
          38
          45
          24
          38
          46
          48
          45
          31
          40
          52
          39
          38
          54
          35
          48
          44
          53
          34
          48
          49
          43
          54
          53
          34
          50
          53
          48
          47
          57
          52
          44
          54
          43
          50
          56
          54
          45
          54
          32
          49
          57
          44
          52
          43
          38
          43
          56
          47
          43
          53

          William Agurto.

          Reply
          • Thanks William,
            Thanks for sending me your data. Yes, there is a bug in the software that was inadvertantly introduced a couple of releases ago.
            I have now fixed the bug. I plan to issue a new bug-fix release shortly which will resolve this issue.
            Thank you very much for bringing this issue to my attention.
            Charles

          • Hello, Charles.

            Today I tested Real Statistics Data Analysis Tools, Release 8.8.1, with the same data, the same parameters (column heading included with data; alpha=0.05; Hyp Mean/Median=0; two independent samples; T-Test and Non Parametric Test; using: ties correction, using continuity correction, including simulation, inscluding exact test) and I got the same error (Excel 365, Spanish version):

            “A runtime error has occurred. The analysis tool will be aborted.
            No se puede asignar la propiedad FormulaArray de la clase Range”

            It seems like the bug is present in a rutine related to “include simulation” process, because it’s not present when I don’t select that option (and it’s present when I select only “include simulation” option).

            I will appreciate your comments.

            Thanks a lot.

            William Agurto.

            Note: Here the data:

            Nivel Autoestima, Mujeres (n1=46):
            34
            32
            35
            41
            38
            29
            43
            26
            52
            41
            44
            47
            33
            53
            32
            41
            38
            32
            38
            57
            35
            29
            31
            56
            36
            42
            47
            28
            37
            44
            42
            41
            50
            44
            39
            30
            37
            55
            56
            29
            51
            38
            51
            40
            40
            33

            Nivel de Autoestima, Varones (n2=53):
            52
            52
            45
            38
            45
            24
            38
            46
            48
            45
            31
            40
            52
            39
            38
            54
            35
            48
            44
            53
            34
            48
            49
            43
            54
            53
            34
            50
            53
            48
            47
            57
            52
            44
            54
            43
            50
            56
            54
            45
            54
            32
            49
            57
            44
            52
            43
            38
            43
            56
            47
            43
            53

          • William,
            Sorry about the persistence of this bug. Apparently, I fixed it in the Mac version of the release but not the Windows version (or the new Xrealstatsx.clam version). I will issue a new release on Sunday to correct this.
            Charles

  5. Dear Charles:

    I tested the last version of Real Statistics (release 8.8). When I used the dialog box for T Tests and Non Parametric Statistics for two independent samples, selecting non parametric test (for obtaining Mann Whitney results), including simulation (10000 iterations) and exact test for p-value, using continuity correction, the program got an error for p-value simulation (a message related to a mistake in a matrix calculation), and I obtained N/A (not a formula: a text N/A) instead of a valid result. I used the complement in Excel 2010, Spanish version.
    After that, with the purpose of detecting if the last release was the problem, I followed the same steps in an earlier release of Real Statistics (downloaded in July 8th, 2021), and I could get results for p-value simulation (although Excel turned slow because of matrix calculations after that).
    Is it possible that matrix operation upgrading since July 2021 be the source of the bug?

    I’ll be waiting for your comments.

    Thank you.

    William Agurto.

    Reply
  6. Hello Charles,

    thank you for the article, it helped me massively with getting into MW formula. I have one question. Do you have proof on the adjusted variance of the U? I was thinking of using the linearity of variance with a combination of the first formula mentioned at the beginning.
    But I just cannot finish it to the correct answer.

    Thank you and cheers from Czechia.
    Anna

    Reply
    • Hello Anna,
      Good to hear from someone from Czechia.
      Which adjustment are you referring to? The ties adjustment or the continuity adjustment or something else?
      Charles

      Reply

Leave a Comment