Bootstrapping for Order Statistics

Another approach for estimating the 95% confidence interval for the median of a finite sample is to use bootstrapping. This approach is explained further in Resampling Procedures and is based on sampling with replacement.

Examples

Example 1: Range B2:K2 of Figure 1 displays a sample of size 10 taken from some population, Estimate the population median and a 95% confidence interval for the population median.

Bootstrapping for order statisticsFigure 1: Bootstrapping

The median of the sample is a reasonable estimate for the population median. This is 12.34125 as shown in cell L2 using the formula =MEDIAN(B2:K2).

We now create 200 random bootstrap samples, as shown in range B4:K203 (although only the first 10 samples are displayed). E.g. the first bootstrap sample is shown in range B4:K4 using the array formula =RANDOMIZE(B$2:K$2). We then calculate the median of each bootstrap sample as shown in column L. E.g. the median of the first bootstrap sample (cell L4) is calculated by the formula =MEDIAN(B4:K4).

We now use the 200 bootstrap medians to make estimates of population statistics, as shown in range O4:O7. The bootstrap estimate of the population median is 12.45235 (cell O4) as calculated by =AVERAGE(L4:L203). As expected, this value is close to the sample estimate shown in cell L4.

The bootstrap enables us to obtain the standard error of the estimate of the population median, as shown in cell O5 using the formula =STDEV.S(L4:L203). In addition, we obtain a 95% confidence interval for the population median of (12.052, 13.1185). This is obtained from the lower and upper 2.5% of the bootstrap sample medians. Since 2.5% of 200 (the number of bootstraps) is 5, we use the formulas =SMALL(L4:L203,5) in cell O6 and =LARGE(L4:L203,5) in cell O7 to obtain this result.

We can obtain a more accurate estimate of the median and a tighter confidence interval by increasing the number of bootstrap samples.

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following array function.

ORDER_BOOTSTRAP(R1, k, lab, iter, alpha): returns a column array with the estimated value of the kth order statistic based on the sample in R1 using a bootstrap simulation with iter iterations (default 1,000); the output also contains the standard error of the estimate along with the 1-alpha confidence interval (default for alpha is .05).

If k = 0 (default) then the output estimates the median instead of the kth order statistic. If lab = TRUE (default FALSE), then a column of labels is appended to the output.

Note that the formula =ORDER_BOOTSTRAP(B2:K2,,TRUE,200) produces output similar to that shown in range N4:O7 in Figure 1.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Chen, P-N (2008) Basic theories on order statistics
Reference is no longer available

Omondi, O. C. (2016) Order statistics of uniform, logistic and exponential distributions
http://erepository.uonbi.ac.ke/bitstream/handle/11295/97307/MSc_Project2016.pdf?sequence=1&isAllowed=y

Ma D. (2010) The distribution of the order statistics. A Blog on probability and statistics
https://probabilityandstats.wordpress.com/2010/02/20/the-distributions-of-the-order-statistics/

Border, K. C. (2016) Lecture 15: Order statistics; conditional expectation. Caltech
https://healy.econ.ohio-state.edu/kcb/Ma103/Notes/Lecture15.pdf

5 thoughts on “Bootstrapping for Order Statistics”

  1. Dear Charles,
    I do appreciate your previous help for estimating the standard error for a sample, where each of of the collected data points has an error range of +/- 5%. You provided the formula: $B1 =$A1*(.95+.1*RAND()), where A1 is the original data point. As you noted, I can use other formulas to make the error normally distributed instead of uniformly distributed as done above.
    Could you please help me with a formula for normally distributed error ?
    Thank you very much for your help.
    Best regards,
    Samir

    Reply
    • Hello Samir,
      The formula =RAND() is used to generate a random number from a uniform distribution on the interval [0,1].
      To generate a random number from a normal distribution with mean m and standard deviation s, you can use the formula =NORM.INV(RAND(),m,s).
      Charles

      Reply
    • Hello Gerardo,
      We are all well. I hope that you and your family continue to be well as well.
      Bootstrapping for order statistics was added in Rel 7.8 in August 2021.
      Bootstrapping for ANOVA, comparing two samples, etc. was added much earlier.
      Charles

      Reply

Leave a Comment