Two-Sample Runs Test

Basic Concepts

The Wald-Wolfowitz two-sample runs test is used to determine whether two samples come from the same distribution. The test orders the values in the combined sample creating a sequence of symbols 1 (if the value comes from sample 1) and 2 (if the value comes from sample 2) and then using the one-tailed version of the one-sample runs test.

If there are ties, then the number of runs will differ depending on how the 1’s and 2’s for the tied values are ordered. In this case, we perform multiple versions of the test randomly changing the order of the 1’s and 2’s with tied ranks.

Note that when there is a significant difference between the distributions of the two samples, we can’t tell whether this is a difference in means, medians, variances, skewness, kurtosis, etc.

Example

Example 1: Determine whether the samples in ranges B4:B11 and C4:C10 of Figure 1 come from the same distribution.

First, we rearrange the input data as shown in range E4:F18. Essentially we are creating a stacked version of the original data in column E, labeling the data from sample 1 with a 1 in column F and labeling the data from sample 2 with a 2 in column F.

Next, we sort the data putting the results in range H4:I18. We can do this by using the array formula =QSORTRows(E4:F18,1).

Two-sample runs data

Figure 1 – Data for Two-Sample Runs Test

We next use the array formula for the one-sample runs test=RUNSTEST(I4:I18,TRUE,1) to obtain results similar to those shown in range K4:L11 of Figure 2.

Two-sample runs test

Figure 2 – Two Sample Runs Test

Note that the value 36 appears twice in the original data, as shown in Figure 2, once in sample 1 and again in sample 2. When the data is sorted we see that 36 appears in cells H9 and H10. The order shown in column I is the one that produces the fewest number of runs (namely 9). However, if the values in cells I9 and I10 are interchanged the number of runs increases by 2 to 11. Thus, there are two possible outcomes, as shown in range N4:P12 of Figure 2.

Worksheet Function

We can simplify the calculations by using the following Real Statistics function.

Real Statistics Function: The following array function is provided in the Real Statistics Pack:

RUNS2TEST(R1, R2, labiter): outputs a 9 × 1 column range as shown in range L4:L11 of Figure 2 with the results of the two-sample runs test on the data in ranges R1 and R2 if lab = FALSE (default) and a 9 × 2 column range, including labels, as shown in range K4:L11 if lab = TRUE.

In the above iter = 0 (default); the default value is used if there are no ties or we are willing to accept a random (actually semi-random) ordering of the 1’s and 2’s for the tied values.

Handling Ties

If we are not willing to accept the default output, we ask the function to randomly change the order of the 1’s and 2’s for the tied values iter number of times. For Example 1, if we set iter = 100, we see from the right side of Figure 2 that the runs = 9 case occurs 45 times and the runs = 11 case occurs 55 times. Note that the p-values for these two cases are different.

For Example 1, the array formula =RUNS2TEST(B4:B11,C4:C10,TRUE) can be used to obtain the output shown in range K4:L11 of Figure 2.

If instead, we use the array formula =RUNS2TEST(B4:B11,C4:C10,TRUE,100) we obtain the output shown in range N4:P12. In this case, we know that there are 2 possible outcomes, and so we need to highlight a 9 × 3 range for the output. In general, some guesswork might be required to determine how large a range to use for the output. After seeing the output you might have to increase the size of the output range. Also note that since the orders are randomly generated, the results are likely to be different on successive runs of the formula.

Data Analysis Tool

You can also use Real Statistics’ Non-parametric Tests data analysis tool to perform the runs test. Click here for additional information.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

NIST (2012) Runs test for detecting non-randomness
https://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm

Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf

4 thoughts on “Two-Sample Runs Test”

  1. Thx sir for your posting very good. How about if we have ,n1=9 and ,n2=21,runs=13. Whose that n1 is value that low from median. So,what the runs table should i use? Thanks

    Reply

Leave a Comment