Simulation for Order Statistics

For a known continuous distribution, we can estimate facts about the order statistics (as well as the median or mean) for a sample from this distribution by using Monte Carlo simulation.

Examples

Example 1: Estimate the median for a sample of size 10 from a population with a gamma distribution with parameters alpha = 100 and beta = .13. Also, find the standard error of this estimate and a 95% confidence interval of this estimate.

We create 200 simulated samples from this distribution, as shown in range E2:N201 of Figure 1 (only the first 10 samples are shown). We generate each element in this range via the worksheet formula =GAMMA.INV(RAND(),100,0.13).

Simulation of order statisticsFigure 1 – Monte Carlo simulation

From this point on, the approach is similar to that used for bootstrapping. The result is shown in range Q2:R5. For example, the estimated median of 12.9111 is shown in cell R2 using the formula =AVERAGE(O2:O201). Note that this estimate is pretty close to the population median of 12.9567, as calculated by the formula =GAMMA.INV(.5,100,.13). Increasing the number of simulations will tend to improve the accuracy of the results.

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack provides the following two array functions. These functions refer to a distribution dist (“uniform”, “normal”, etc.) with the specified parameters as described for the MEAN_DIST and VAR_DIST functions (see Distribution Property Functions).

ORDER_SIM(k, n, lab, iter, alpha, dist, param1, param2, param3): returns a column array with the estimated value of the kth order statistic based on iter (default 1000) simulated samples of size n from the distribution specified by dist with the specified parameters; the output also contains the standard error of the estimate along with the 1–alpha confidence interval (default for alpha is .05).

If k = 0 (default) then the output estimates the median instead of the kth order statistic. If lab = TRUE (default FALSE), then a column of labels is appended to the output.

Note that the formula =ORDER_SIM(0,10,TRUE,,,”gamma”,100,0.13) produces output similar to that shown in range Q2:R5 in Figure 1.

RANGE_SIM(j, k, n, lab, iter, alpha, dist, param1, param2, param3): returns a column array with the estimated value of the x(k) x(j) range statistic based on iter (default 1000) simulated samples of size n from the distribution specified by dist with the specified parameters; the output also contains the standard error of the estimate along with the 1–alpha confidence interval (default for alpha is .05).

E.g. the expected value for the x(10)x(1) range statistic (along with the standard error of the estimate and 95% confidence interval) for a sample of size 10 from the Gamma distribution with parameters alpha = 100 and beta = .13 can be calculated using the following array formula: =RANGE_SIM(1,10, ,10,TRUE,,,”gamma”,100,0.13). The results are shown in Figure 2.

Simulation for range statistic

Figure 2 – Simulation of the range statistic

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Chen, P-N (2008) Basic theories on order statistics
Reference is no longer available

Omondi, O. C. (2016) Order statistics of uniform, logistic and exponential distributions
http://erepository.uonbi.ac.ke/bitstream/handle/11295/97307/MSc_Project2016.pdf?sequence=1&isAllowed=y

Ma D. (2010) The distribution of the order statistics. A Blog on probability and statistics
https://probabilityandstats.wordpress.com/2010/02/20/the-distributions-of-the-order-statistics/

Border, K. C. (2016) Lecture 15: Order statistics; conditional expectation. Caltech
https://healy.econ.ohio-state.edu/kcb/Ma103/Notes/Lecture15.pdf

Leave a Comment