Runs Distribution

Basic Concepts

Suppose there are n1 elements of one type (say 1) and n2 of the other type (say 2) and let x be the random variable equal to the number of runs. We now describe the (discrete) distribution function of x.

First, note that P(x = r) = the # of permutations that result in r runs divided by the # of permutations of Β 1’s and Β 2’s. The denominator is the number of ways of placing n1 1’s in the n1+n2 positions, i.e. C(n1+n2, n1). The formula for the numerator is more complicated.

We ignore the case where r = 1 since this only occurs if n1 = 0 or n2 = 0.

Even Case

Suppose that r is even. Thus r = 2m for some positive integer m. Ignoring the repetitions of the 2’s for a moment and assuming that the first number in the sequence is a 1 we count the number of sequences with r runs. Since 1 is in the first position 2 must be in the last position (for r to be even), and so the problem is equivalent to counting the number of ways of placing exactly one 2 after mβˆ’1 of the n1βˆ’1Β 1’s. This is C(n1βˆ’1,Β mβˆ’1).

E.g. if n1 = 7 and n2 = 4 then runs of length 8 can be achieved as follows

12111211212, 11121212112, etc.

If instead, the first number in the sequence is a 2 then the last number is a 1, and so the problem is equivalent to counting the number of ways of placing exactly one 2 before m-1 of the n1-1 1’s, which is again C(n1βˆ’1,Β mβˆ’1). Thus if r is even we have 2β‹…C(n1βˆ’1,Β mβˆ’1)Β possibilities.

We next need to account for the fact that there can be multiple 2’s in sequence. The same logic shows that there are C(n2βˆ’1,Β mβˆ’1) ways of distributing the 2’s (treating the sequences of 1’s as the spacers between sequences of 2’s).

Thus if r = 2mΒ there are 2β‹…C(n1βˆ’1,Β mβˆ’1) β‹…C(n2βˆ’1,Β mβˆ’1) ways of obtaining r runs. Thus

image9098

Odd number of runs

If r is odd then r = 2m+1 for some positive integer m. The approach is the same, except that this time sequences must begin and end with the same number. Suppose that the sequence begins (and ends) with a 1.Β  Since we must insert one extra 2 (compared to the case where r is even), we have C(n1βˆ’1,Β m)Β possible sequences where a single 2 acts as a spacer between sequences of 1’s.

Once again there are C(n2βˆ’1,Β mβˆ’1)Β ways of distributing the 2’s and so there are C(n1βˆ’1,Β m) β‹…Β C(n2βˆ’1,Β mβˆ’1) ways of obtaining r runs if the first in the sequence is a 1. Similarly, if the first in the sequence is a 2 then there are C(n2βˆ’1,Β m) β‹…Β C(n1βˆ’1,Β mβˆ’1)Β ways of obtaining r runs. Thus

image9099

Note that if n1 β‰₯ n2 then the maximum value of r is 2n2+1. Note that the above formula works fine as long as Β C(n2βˆ’1,Β m) = C(mβˆ’1 m) = 0. Unfortunately, the Excel formula =COMBIN(a,b) returns an error if a < b.

Example

Example 1: Is 22211121121221111 a random sequence?

Here n1Β = 10, n2Β = 7 and r = 8. Using the above formulas we can create the worksheet shown in Figure 1.

Runs test exact

Figure 1 – Runs Test

The maximum number of runs is 2n2+1 = 2(7)+1 = 15. Some sample formulas are shown in Figure 2.

Cells Item Formula
B18 r = 13 (odd) =COMBIN(B$3-1,QUOTIENT(A18,2))*COMBIN(B$4-1,QUOTIENT(A18,2)-1)+COMBIN(B$4-1,QUOTIENT(A18,2))*COMBIN(B$3-1,QUOTIENT(A18,2)-1)
B19 r = 14 (even) =2*COMBIN(B$3-1,QUOTIENT(A19,2)-1)*COMBIN(B$4-1,QUOTIENT(A19,2)-1)
B20 r = 15 (last) =COMBIN(B$3-1,QUOTIENT(A20,2))
B21 max =SUM(B7:B20) or =COMBIN(B3+B4,B3)
C18 P(r = 13) =B18/B21
D18 cdf at r = 13 =C18+D17

Figure 2 – Sample formulas from Figure 1

If we perform a two tailed test with Ξ± β‰ˆ .05, from Figure 1 we see that P(r ≀ 5) = .0245 and P(r β‰₯ 13) = 1 – .990 = .010, but P(r ≀ 6) = .080 and P(r β‰₯ 12) = 1 – .957 = .043.Β  Thus, we can’t reject the null hypothesis that the sequence of values is random for any values of rΒ in the interval [6, 12], while we reject the null hypothesis for values of rΒ outside this interval. For Example 1, r = 8, which is inside this interval, and so we cannot reject the null hypothesis that the sequence is random.

Worksheet Functions

Real Statistics Functions: The following functions are provided in the Real Statistics Pack:

RUNSDIST(r, n1, n2, cum) = the probability density function value at r of the runs distribution (i.e. the probability that there are r runs in a sequence of n1 of one type and n2 of the other type) when cum = FALSE and the corresponding cumulative probability distribution value at r (i.e. the probability there are at most r runs) when cum = TRUE.

RUNSINV(p, n1, n2) = the critical value; i.e. the smallest value of r such that RUNSDIST(r, n1, n2,Β TRUE)Β β‰₯ p.

Observations

For the table lookup functions RLowerCRIT and RUpperCRIT, defined in Single Sample Runs Test, the following equalities hold.

RLowerCRIT(n1,Β n2,Β Ξ±, 1) = RUNSINV(Ξ±, n1,Β n2)

RUpperCRIT(n1,Β n2,Β Ξ±, 1) = RUNSINV(1βˆ’Ξ±, n1,Β n2)

RLowerCRIT(n1,Β n2,Β Ξ±, 2) = RUNSINV(Ξ±/2, n1,Β n2)

RUpperCRIT(n1,Β n2,Β Ξ±, 2) = RUNSINV(1βˆ’Ξ±/2, n1,Β n2)

Examples Workbook

Click hereΒ to download the Excel workbook with the examples described on this webpage.

Reference

Mehta, C. R. and Patel, N. R. (2013) IBM SPSS exact tests
https://www.ibm.com/docs/en/SSLVMB_27.0.0/pdf/en/IBM_SPSS_Exact_Tests.pdf

10 thoughts on “Runs Distribution”

  1. Hello Charles.
    I have one more question. If n1 and n2 are equal (for example n1 = 10, n2 = 10) then maximum value of r is 2*n1 or 2*n2, am I right?

    Joe

    Reply
    • Hello Joe,
      Yes, that is correct.
      I just noticed that there is an error in the calculation of the Real Statistics function RUNSDIST(r,n1,n2,cum) when n1 is not equal to n2 and r = 2*n1 + 1 (assuming n1 < n2). This will be corrected in the next release of the Real Statistics software, due out in a few days. Charles

      Reply
  2. Hello,
    this is great site but I have two remarks.

    1) If r is odd the correct equation is r = 2m – 1.

    2) Figure 1
    How did You calculate the value in C20? I am asking cause one part of the equation have to be C(n2-1;m-1) = C(7-1,(15+1)/2-1) = C(6,7) = error so the C20 is error and no number one. Or am I missing something?

    Thank you for explanation πŸ™‚

    Reply
      • Hello Charles.
        Thank You for your fast answer.
        I calculated the cells accordingly with publication of “Swed and Eisenhart – Tables for testing randomness of grouping in a sequence of alternatives” and it leads to the same results as on figure 1 accept cell C20 and, of course, D20. The way I used leads to the error in the cells, but It seems like your way leads to the result. Very interesting! πŸ™‚

        There is r = 2m – 1 in case of r = odd in the article I mentioned above, so probably, in your way, there should really be r = 2m + 1. πŸ™‚

        Thank you for this web! It helps me a lot to understand a statistics. πŸ™‚

        Reply

Leave a Comment