Sampling using Real Statistics Capabilities

We describe Real Statistics capabilities for creating a random sample with or without replacement. These capabilities are easier to use than the standard Excel capabilities described in Sampling.

Worksheet Functions

Real Statistics Excel Functions: The Real Statistics Resource Pack provides the following useful array functions for creating a random sample with or without replacement.

SHUFFLE(R1, filler, nrows, ncols): returns an nrowsncols array with elements from R1 drawn at random without replacement (i.e. it shuffles the elements in R1). The string filler is used as a filler in case the output array has more cells than R1. This second argument is optional and defaults to the error value #N/A; ncols defaults to 1.

RANDOMIZE(R1, nrows, ncols): returns an nrowsncols array with elements from R1 drawn at random with replacement; ncols defaults to 1.

If nrows = 0 or is omitted then the output array is the highlighted range on the active worksheet. If nrows < 0 then the output is an array of the same size and shape as R1; in both these cases, ncols is not used and can be omitted.

The output for the random sample for Group 1 in Example 1 of Sampling can be obtained by inserting the array formula =RANDOMIZE(A6:B15) in the range D7:D12 of Figure 1 on that webpage. If instead, you want sampling without replacement then you can use the array formula = SHUFFLE(A6:B15).

The output doesn’t have to have the same shape and size as the input, as shown in Figure 1.

RANDOMIZE and SHUFFLE example

Figure 1 – Use of SHUFFLE and RANDOMIZE functions

Dynamic Array Support

The Real Statistics Resource Pack also provides the following functions which support dynamic arrays:

SHUFFLES(R1): returns an array of the same size and shape as R1 with elements from R1 drawn at random without replacement (i.e. it shuffles the elements in R1). 

RANDOMIZES(R1): returns an array of the same size and shape as R1 with elements from R1 drawn at random with replacement.

The Real Statistics Resource Pack also supports the following array functions.

SHUFFLERows(R1): returns an array of the same size and shape as R1 with the rows from R1 drawn at random without replacement (i.e. it shuffles the elements in R1). 

RANDOMIZERows(R1): returns an array of the same size and shape as R1 with the rows from R1 drawn at random with replacement.

ExtractRandRows(R1, k): returns an array with the same number of columns as R1, but with a subset of the rows in R1. In fact, for every value in column k of R1, one row with this value is selected at random from R1 and included in the output.

Examples

Figure 2 contains some examples of the use of the RANDOMIZE and RANDOMIZES worksheet functions.

Randomizing examples

Figure 2 – Use of RANDOMIZE(S) functions

Some examples of the use of the SHUFFLE and SHUFFLES worksheet functions are displayed in Figure 3.

Random sampling without replacement

Figure 3 – Use of SHUFFLE(S) functions

ExtractRandRows example

Figure 4 – Use of ExtractRandRows

Data Analysis Tool

Real Statistics Tool: The Real Statistics Resource Pack provides the Reformatting a Data Range data analysis tool which includes functionality similar to that provided by the SHUFFLE and RANDOMIZE functions. See Reformatting Capabilities for additional information about this data analysis tool.

Example 1: Use the Reformatting a Data Range data analysis tool to completely shuffle all the data elements in the population from Example 1 of Sampling (duplicated in range B5:C14 of Figure 5).

Press Ctrl-m and select the Reformatting a Data Range option (from the Desc tab). Fill in the dialog box that appears, as shown in Figure 5 (although the # of Columns field can be left blank since it would default to 1).

Random sampling without replacement

Figure 5 – Dialog box for reformatting a data range

The output is shown on the right side of Figure 5.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Chuza, E. (2016) Use the Sampling Analysis tool in Excel to easily select random numbers. Sage Intelligence
https://www.sageintelligence.com/tips-and-tricks/excel-tips-tricks/2016/02/sampling-analysis-tool/

3 thoughts on “Sampling using Real Statistics Capabilities”

  1. Hi
    I have 2 columns. 1 with duplicate names and the other numbers.
    How do I generate a random sample without, with no duplicates from column 1?
    Kind regards
    Adrian

    Reply
    • Hi Adrian,
      I found a way to do this, but it is a bit complicated. I give the steps as follows, assuming that the data is in range A1:B10. First note that if column A contains the values A, B, C, and D, then essentially we are choosing at random one number from column B corresponding to each of the four values in column A.
      Step 1: Insert the values 1, 2, …, 10 in range C1:C10.
      Step 2: Insert the formula =SHUFFLES(C1:C10) in range D1:D10
      Step 3: Copy range D1:D10 and then Paste this range back into D1:D10 (this is done since SHUFFLES is a volatile function)
      Step 4: Insert =INDEX(A$1:A$10,$D1) into cell E1, highlight range E1:F10, and press Ctrl-D and Ctrl-R. Range E1:F10 now has a random permutation of the rows in A1:B10. We now need to eliminate duplicates in column E.
      Step 5: Insert =IF(COUNTIF($E$1:$E1,$F1)=1,E1,””) in cell G1, highlight G1:H10, and press Ctrl-D and Ctrl-R. Column G has no duplicates. We now need to eliminate any blank rows. This can be done using Excel’s sorting capability in the Data ribbon, although there may be preceding blank rows. We instead continue using Real Statistics capabilities.
      Step 6: If using Excel 365 (relying on the dynamic arrays capability), insert the formula =DELROWS(G1:H10,,TRUE) in cell I1. Otherwise insert the array formula =DELROWS(G1:H10,,TRUE) in range I1:J10. In either case, columns I and J contain the desired result, although the names in column I won’t necessarily be in sorted order.
      Step 7: If you want the names to be in sorted order, you can use Excel’s sorting capability in the Data ribbon. Alternatively, you can use Excel 365’s new sorting function by placing the formula =SORTBY(I1#,I1:I10) in cell K1. Another choice is to place the Real Statistics formula =SORTRows(I#,1) in cell K1 if using Excel 365 or =SORTRows(I1:J10,1) in K1:L10, otherwise.
      I am considering providing a simplified way of doing this in the next Real Statistics release.
      Charles

      Reply

Leave a Comment