Generalized Extreme Studentized Deviate Test

Basic Concepts

The Generalized Extreme Studentized Deviate (ESD) Test is a generalization of Grubbs’ Test and handles more than one outlier. All you need to do is provide an upper bound on the number of potential outliers.

We test the null hypothesis that the data has no outliers vs. the alternative hypothesis that there are at most k outliers (for some user-specified value of k).

To test the data set S with n elements is we generate k test statistics G1, G2, …, Gk where each Gj is a two-tailed Grubbs’ statistic, defined as follows:

S1 = S

 j is the mean of Sj and sj is the standard deviation of Sj


Sj+1 = Sj − {xj} where xj = the element  in Sj such that |x− | is maximized

Essentially you run k separate Grubbs’ tests, testing whether Gj > Gj-crit where Gj-crit is Gcrit as described at Grubbs’ Test, but adjusted for the correct value of the sample size; i.e. n is replaced by − + 1. Now let r be the largest value of jk such that Gj > Gj-crit. Then we conclude there are r outliers, namely x1, …, xr. If r = 0 there are no outliers.

Note that if Gj > Gj-crit and h < j, then both xh and xj are outliers even if Gh ≤ Gh-crit.


Example 1: Identify all the outliers in the data set shown in range A5:B15 of Figure 1.

ESD (Grubbs') Test

Figure 1 – First trial of the ESD Test

Looking at the data set, we see five potential outliers: 3, 40, 350, 410, and 440. As we did in Grubbs’ Test we need to test for normality. In fact, if we were to run the Shapiro-Wilks test it would show that the data set without the five potential outliers is normally distributed. We, therefore, use the ESD Test with k = 5 (for five outliers); in fact, just to be sure we will set k = 6.

The Grubbs’ Test for the first outlier is shown on the right side of Figure 1. This is the two-tailed version of the test shown in Figure 2 of Grubbs’ Test. We see that the minimum data value is 3 (cell E5) and the maximum value is 440 (cell E6). We also see from cells E9 and E10 that the maximum value is farther away from the mean than the minimum value, and so our first test is to see whether 440 is an outlier.


The test is not significant, but as we shall see, this doesn’t necessarily mean that 440 is not an outlier. We now run the test five more times. The next two trials are shown in Figure 2.

ESD Test

Figure 2 – Trials 2 and 3 of the ESD Test

The data set for the second trial (range I5:J15) is the same as for the first trial, but with the data element 440 removed. The second trial shows that once again the maximum value (410) is further away from the mean than the minimum value (3). This means that our second trial is a test as to whether 410 is an outlier. Once again the test is not significant.

Removing 410, we get the data for the third trial as shown in range O5:P15. Once again the maximum value (350) is further away from the mean than the minimum value (3), but this time the test is significant, which means that 350 is an outlier. But this automatically classifies 440 and 410 as outliers too.

Results Summary

We summarize the results of all six trials in Figure 3.

Trial outlier G G-crit sig
1 440 2.497556 2.757735 no
2 410 2.729992 2.73378 no
3 350 2.714963 2.708246 yes
4 3 2.721414 2.680931 yes
5 40 2.83852 2.651599 yes
6 220 1.707766 2.619964 no

Figure 3 – ESD Test Summary

Figure 3 confirms that 3, 40, 350, 410, and 440 are outliers (220 is not an outlier).

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following array function to perform the ESD test.

ESD(R1, lab, alpha, k): outputs a 4 × k column array with the following entries in each column: potential outlier, G, Gcrit, and test significance

If lab = TRUE (default FALSE) then the output is a 4 × (k+1) array with a column of labels appended. alpha = the significance level (default .05). The potential outlier is either the maximum or minimum value in R1, depending on which is farthest away from the mean of R1. The test significance is “yes” if G > Gcrit and “no” otherwise.

If k is omitted (or zero) and lab = FALSE then the value of k is set to the number of columns in the highlighted range, while if k is omitted (or zero) and lab = TRUE then the value of k is set to the number of columns in the highlighted range minus 1 (the extra column contains the labels).

For Example 1, if you highlight the range AN23:AT26, enter the formula =ESD(A5:A15,TRUE) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 4.

ESD Test Real Statistics

Figure 4 – Output from ESD formula

Since the highlighted range contains 7 columns and lab = TRUE, k = 6.

Another Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack also provides the following simpler array function to perform the ESD test.

OUTLIERS(R1, alpha, k): outputs a column array with up to k outliers from R1; if k is omitted (or zero) then the value of k is set to the number of rows in the highlighted range; alpha defaults to .05.

For Example 1, if you highlight the range AV23:AV31, enter the formula =OUTLIERS(A4:A14) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 5.

Outliers Grubbs' test Excel

Figure 5 – Output from OUTLIERS formula

Since the highlighted range contains 9 rows, k = 9. As you can see from Figure 5, even if we perform the ESD test with 9 trials, we still get the same five outliers.

Examples Workbook

