Distribution of Order Statistics from a Continuous Population

If we take a sample of size n from a continuous population, the kth order statistic is the kth smallest element in the sample. If we assume that the order of the elements in the sample is x1x2 ≤ … ≤ xn, then the kth order statistic, denoted x(k) is xk. Just as the mean can be treated as a random variable, we will also use the notation x(k) to represent a random variable.

CDF of the kth order statistic

Suppose that the population has a continuous distribution with cdf F(u). We describe the (cumulative) distribution Fk(x) of the kth order statistic in a sample of size n taken from the population.

Property 1: The cdf of the kth order statistic in the sample is

Order statistic cdf

Proof:

Proof 1

Proof 2

Proof 3

The probability that i elements in the sample are less than or equal to x is equivalent to tossing a coin n times and getting i heads where the probability of heads on any toss is F(x). Using the binomial distribution this is C(n,i)F(x)n[1-F(x)]n-i, which completes the proof.

Special cases

If n is odd so that n = 2k+1, the cdf of the kth order statistic (the median) in the sample is

Order statistic cdf

The cdf of the nth-order statistic is

cdf of the median

Alternatively,

nth order cdf

Similarly, the cdf of the first-order statistic

cdf of first order statistic

cdf 1st order stat

Using algebra, we see that the inverse functions for the first- and last-order statistics are:

Inverse first order statistic

Inverse last order statistic

Beta distribution characterization

We can express the cdf of the kth order statistic more simply based on the following property. This property is explained in Order Statistics for a Uniform Distribution.

Property 2: For any distribution with cdf F(x)

F(x)∼ Bet(k, n-k+1)

and so

Fk(x) = BETA.DIST(F(x),k,n-k+1,TRUE)

For example, the kth order statistic for the gamma distribution Gamma(α, β) could be calculated in Excel by the formula

Fk(x) = BETA.DIST(GAMMA.DIST(x,α,β,TRUE),k,n-k+1,TRUE)

Inverse of the CDF

Since

F(x)∼ Bet(k, n-k+1)

Now suppose

Fk(x) = p

Thus

G-1(p) = F(x)

where G(x) is the cdf for the beta distribution Bet(k, n-k+1). It now follows that

x = F-1(G-1(p))

As we noted above, the kth order statistic for the gamma distribution Gamma(α, β) can be calculated in Excel by the formula

Fk(x) = BETA.DIST(GAMMA.DIST(x,α,β,TRUE),k,n-k+1,TRUE)

Thus, the inverse can be calculated in Excel by the formula

Fk-1(p) = GAMMA.INV(BETA.DIST(p,n-k+1),α,β)

The same is true for any distribution, not just the gamma distribution.

PDF of the kth order statistic

Suppose that the population has a continuous distribution with pdf f(u) and cdf F(u). We describe the probability density function fk(x) of the kth order statistic in a sample of size n taken from the population.

Property 3: The pdf of the kth order statistic is

pdf kth order statistic

Proof: Click here for the proof.

Special cases

pdf x(1) part 1

pdf x(1) part 2

pdf x(n) part 1

pdf x(n) part 2

Suppose n is an odd number where n = 2k – 1 then the pdf of the median can be expressed as

Order statistic median 1

Order statistic median 2

Mean and variance of the kth order statistic

Property 4: The mean μk of the kth order statistic for a sample of size n can be expressed as

Mean kth order statistic

Here Area0≤x≤1h(x) is the area under curve y = h(x) bounded by x = 0, x = 1 and the x axis. For those of you familiar with calculus, this can be expressed as

Mean integral expression

Property 5: The variance of the kth order statistic for a sample of size n can be expressed as

variance kth order statistic

where

Second moment

or using calculus

Second moment calculus

Approximate Values

The mean μk and variance of the kth order statistic for a sample of size n from a distribution with pdf f(x) and cdf F(x) can be estimated by

Approximate mean and variance

where p = k/(n+1).

The accuracy of these estimates varies widely. The estimates are exact for a uniform distribution.

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack supports the following worksheet functions. These functions refer to a distribution dist (“uniform”, “normal”, etc.) with the specified parameters as described for the MEAN_DIST and VAR_DIST functions (see Distribution Property Functions).

ORDER_DIST(x, k, n, cum, dist, param1, param2, param3) = the pdf f(x) for the kth order statistic from a sample of size n for the specified distribution if cum = FALSE and the corresponding cdf F(x) if cum = TRUE.

ORDER_INV(p, k, n, dist, param1, param2, param3) = the inverse at p for the kth order statistic from a sample of size n for the specified distribution; i.e. the value x such that F(x) = p.

ORDER_MEAN(k, n, iter, dist, param1, param2, param3) = the expected value of the kth order statistic from a sample of size n for the specified distribution.

ORDER_VAR(k, n, iter, dist, param1, param2, param3) = the variance of the kth order statistic from a sample of size n for the specified distribution.

iter = # of intervals used to calculate the integral (default 1000). Note that ORDER_VAR will be available in Rel 9.3.1 of the Real Statistics Resource Pack.

Examples

For example, =ORDER_DIST(7,5,11,TRUE,”laplace”,10,5) takes the value .1576. Thus for a random sample of size 11 taken from the Laplace distribution with mu = 10 and beta = 5, the cdf F5(7) = .157588  for the 5th order statistic. This means that the probability that the 5th order statistic from a sample of size 11 from the specified Laplace distribution is less than or equal to 7 is 15.7588%.

Thus, =ORDER_INV(.157588,5,11,”laplace”,10,5) takes the value 7. Note, however, that =ORDER_INV(.1576,5,11,”laplace”,10,5) takes the value 7.000117.

The value of =ORDER_MEAN(6,11,,”laplace”,10,5) is 10. This means that the expected value for the median, i.e. the 6th order statistic, of the Laplace distribution with μ = 10 and β = 5 is 10. This is as expected since in general μ is the median of a Laplace distribution.

Simulation

We simulate a sample of size 1,000 from the 6th order statistic from a sample of size 11 for a Laplace distribution with mu = 10 and beta = 5. We do this by inserting the formula =ORDER_INV(RAND(), 6, 11,”laplace”, 10, 5) in cell A1, highlight range A1:J100, and press Ctrl-R and Ctrl-D. The first 8 rows are shown on the left side of Figure 1.

Simulation mean and variance

Figure 1 – Simulation Mean and Variance

The mean of this sample is 9.968 as shown in cell M2 and the variance is 3.449 as shown in cell M5. These simulated values are pretty close to the values calculated by the ORDER_MEAN and ORDER_VAR formulas shown in cells M3 and M6.

Normal Distribution

An estimate for the mean of the kth order statistic for a sample of size n from a normal distribution is given by

Normal approximation

where α = π/8. In Excel, we have

μk = μ+σ*NORM.S.INV((k-Pi()/8)/(n-Pi()/4+1))

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Chen, P-N (2008) Basic theories on order statistics
Reference no longer available

Omondi, O. C. (2016) Order statistics of uniform, logistic and exponential distributions
http://erepository.uonbi.ac.ke/bitstream/handle/11295/97307/MSc_Project2016.pdf?sequence=1&isAllowed=y

Ma D. (2010) The distribution of the order statistics. A Blog on probability and statistics
https://probabilityandstats.wordpress.com/2010/02/20/the-distributions-of-the-order-statistics/

Border, K. C. (2016) Lecture 15: Order statistics; conditional expectation. Caltech
https://healy.econ.ohio-state.edu/kcb/Ma103/Notes/Lecture15.pdf

Stack Exchange (2015) Approximate order statistics for normal random variables
https://stats.stackexchange.com/questions/9001/approximate-order-statistics-for-normal-random-variables

Royston, J. P. (1982) Expected normal order statistics (exact and approximate)
https://gwern.net/doc/statistics/order/1982-royston.pdf

Baglivo, J. O. (2004) Mathematica laboratories for mathematical statistics. ASA-SIAM

3 thoughts on “Distribution of Order Statistics from a Continuous Population”

  1. Hi Charles, your treatment of order statistics was quite revealing compared to other sites, so I was wondering if you could show the formula for the variance of the kth order statistic for n samples from a normal parent distribution? This is important when estimating the error of a percentile value PV where I = 0.025*N+0.5 and PV = Sorted Value(I) from a Monte Carlo simulation. When P is an integer, its variance is the variance of the Order Statistic I.
    Thanks.

    Reply

Leave a Comment