Generalized Extreme Studentized Deviate Test

Basic Concepts

The Generalized Extreme Studentized Deviate (ESD) Test is a generalization of Grubbs’ Test and handles more than one outlier. All you need to do is provide an upper bound on the number of potential outliers.

We test the null hypothesis that the data has no outliers vs. the alternative hypothesis that there are at most k outliers (for some user-specified value of k).

To test the data set S with n elements is we generate k test statistics G1, G2, …, Gk where each Gj is a two-tailed Grubbs’ statistic, defined as follows:

S1 = S

 j is the mean of Sj and sj is the standard deviation of Sj

image9118

Sj+1 = Sj − {xj} where xj = the element  in Sj such that |x− | is maximized

Essentially you run k separate Grubbs’ tests, testing whether Gj > Gj-crit where Gj-crit is Gcrit as described at Grubbs’ Test, but adjusted for the correct value of the sample size; i.e. n is replaced by − + 1. Now let r be the largest value of jk such that Gj > Gj-crit. Then we conclude there are r outliers, namely x1, …, xr. If r = 0 there are no outliers.

Note that if Gj > Gj-crit and h < j, then both xh and xj are outliers even if Gh ≤ Gh-crit.

Example

Example 1: Identify all the outliers in the data set shown in range A5:B15 of Figure 1.

ESD (Grubbs') Test

Figure 1 – First trial of the ESD Test

Looking at the data set, we see five potential outliers: 3, 40, 350, 410, and 440. As we did in Grubbs’ Test we need to test for normality. In fact, if we were to run the Shapiro-Wilks test it would show that the data set without the five potential outliers is normally distributed. We, therefore, use the ESD Test with k = 5 (for five outliers); in fact, just to be sure we will set k = 6.

The Grubbs’ Test for the first outlier is shown on the right side of Figure 1. This is the two-tailed version of the test shown in Figure 2 of Grubbs’ Test. We see that the minimum data value is 3 (cell E5) and the maximum value is 440 (cell E6). We also see from cells E9 and E10 that the maximum value is farther away from the mean than the minimum value, and so our first test is to see whether 440 is an outlier.

Analysis

The test is not significant, but as we shall see, this doesn’t necessarily mean that 440 is not an outlier. We now run the test five more times. The next two trials are shown in Figure 2.

ESD Test

Figure 2 – Trials 2 and 3 of the ESD Test

The data set for the second trial (range I5:J15) is the same as for the first trial, but with the data element 440 removed. The second trial shows that once again the maximum value (410) is further away from the mean than the minimum value (3). This means that our second trial is a test as to whether 410 is an outlier. Once again the test is not significant.

Removing 410, we get the data for the third trial as shown in range O5:P15. Once again the maximum value (350) is further away from the mean than the minimum value (3), but this time the test is significant, which means that 350 is an outlier. But this automatically classifies 440 and 410 as outliers too.

Results Summary

We summarize the results of all six trials in Figure 3.

Trial outlier G G-crit sig
1 440 2.497556 2.757735 no
2 410 2.729992 2.73378 no
3 350 2.714963 2.708246 yes
4 3 2.721414 2.680931 yes
5 40 2.83852 2.651599 yes
6 220 1.707766 2.619964 no

Figure 3 – ESD Test Summary

Figure 3 confirms that 3, 40, 350, 410, and 440 are outliers (220 is not an outlier).

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following array function to perform the ESD test.

ESD(R1, lab, alpha, k): outputs a 4 × k column array with the following entries in each column: potential outlier, G, Gcrit, and test significance

If lab = TRUE (default FALSE) then the output is a 4 × (k+1) array with a column of labels appended. alpha = the significance level (default .05). The potential outlier is either the maximum or minimum value in R1, depending on which is farthest away from the mean of R1. The test significance is “yes” if G > Gcrit and “no” otherwise.

If k is omitted (or zero) and lab = FALSE then the value of k is set to the number of columns in the highlighted range, while if k is omitted (or zero) and lab = TRUE then the value of k is set to the number of columns in the highlighted range minus 1 (the extra column contains the labels).

For Example 1, if you highlight the range AN23:AT26, enter the formula =ESD(A5:A15,TRUE) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 4.

ESD Test Real Statistics

Figure 4 – Output from ESD formula

Since the highlighted range contains 7 columns and lab = TRUE, k = 6.

Another Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack also provides the following simpler array function to perform the ESD test.

OUTLIERS(R1, alpha, k): outputs a column array with up to k outliers from R1; if k is omitted (or zero) then the value of k is set to the number of rows in the highlighted range; alpha defaults to .05.

For Example 1, if you highlight the range AV23:AV31, enter the formula =OUTLIERS(A4:A14) and press Ctrl-Shft-Enter, then the output that appears is displayed in Figure 5.

Outliers Grubbs' test Excel

Figure 5 – Output from OUTLIERS formula

Since the highlighted range contains 9 rows, k = 9. As you can see from Figure 5, even if we perform the ESD test with 9 trials, we still get the same five outliers.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2014) Grubbs’s test
https://en.wikipedia.org/wiki/Grubbs%27s_test

Swarup, S. (2021) Anomaly detection with GESD (Generalized extreme studentized deviate) in Python
https://towardsdatascience.com/anomaly-detection-with-generalized-extreme-studentized-deviate-in-python-f350075900e2

47 thoughts on “Generalized Extreme Studentized Deviate Test”

  1. Hi Charles,
    If you were presenting data from this test, is there a p-value you could calculate? How would you report a “significant” outlier.
    Thanks

    Reply
    • Michael,
      This test is generally conducted by comparing the test statistics with a critical value instead of using a p-value.
      How to identify significant outliers is described on this webpage.
      Charles

      Reply
  2. Hi, Dr. Zaiontz;

    Can you please run through my data? I can’t get it to converge.

    11.6 10.2
    11.2 7.8
    9.8 14.3
    9.6 13.4
    16.7 12.0
    10.3 7.7
    15.4 14.2
    15.3 10.9
    13.5 8.0
    9.9 7.2
    9.1 4.0
    8.0 6.7
    14.1 3.4
    10.0 8.6
    15.8 9.7
    23.1 22.2

    Reply
  3. Hello, I have a little doubt . In this paper by ASTM Standardization News, dated:2015
    link:https://www.astm.org/standardization-news/images/nd15/nd15_datapoints.pdf
    The process to calculate the outliers seem a bit different , if we go by their rule in the above example in figure3 , first 5 point would be outliers. I am little confused with the way we proceed. Can you please help me understand this a little better. The steps are as follows according to the mentioned paper:
    GESD Procedure:
    1. Decide a priori the maximum number of outliers
    to test for. Let’s call this number r. (A general
    recommendation is to set r = 20 percent of n.)
    2. Set current cycle index i = 1.
    3. Calculate the quantity T = | observation –
    average| ÷ s for every member of the dataset in
    the current cycle.
    4. Identify the observation with the largest T.
    Designate this as T1 max (i.e., maximum T for the
    first cycle.)
    5. Remove the observation identified in 4) from the
    dataset.
    6. Increase current cycle index i by 1 : i.e. i = i + 1
    7. Repeat steps 3 to 6 using remaining data up to
    and including i = r.
    8. On completing step 7, beginning with Tr max,
    maximum T value in cycle r, and working
    backwards (Tr-1 max, Tr-2 max …. and so on), compare
    this maximum value versus the critical value for
    the specific cycle (λi
    ) obtained from the ASQC
    publication.
    9. Identify the highest cycle for which Ti max exceeds
    its limit value. The observation associated with
    the Ti max for that cycle and all observations
    associated with the Ti max’s for all previous cycles
    up to and including cycle 1 are considered to be
    outliers.

    Reply
  4. hi Dr Charles ,
    thank you very much for your intuitive explanation,
    i might be a bit confused but do we always consider all previous points automatically anomaly points if the current inspected point ‘s test proved that it’s an anomaly point ?
    thank you.

    Reply
  5. Thank you for the site, it is an excellent and very helpful resource.

    May I ask, possibly a daft question?

    Taken from your first paragraph,
    “Essentially you run k separate Grubbs’ tests, testing whether Gj > Gj-crit where Gj-crit is Gcrit as described above, but adjusted for the correct value of the sample size; i.e. n is replaced by n − j + 1”

    Would you mind clarifying for me what j is here, why we minus it and add 1 on?

    Thanks so much

    Reply
    • Hello Kay,
      First of all Gj-crit is defined at Grubbs’ Test.
      Now to answer your question. The index j takes values from 1 to n. When j = 1, the value of n-j+1 = n, and so the sample size is n, the complete sample. When j = 2, the value of n-j+1 = n-1, which is the entire sample minus the one outlier you have potentially already found. When j = 2, then n-j+1 = n-2, which is the complete sample minus potentially two outliers already found. Etc.
      Charles

      Reply
      • That’s great, thanks.

        May I also check that when dealing with a large population size (worthy of Grubb’s ESD method) the way to calculate G-crit?

        To calculate G-crit would we use the Tcrit described on this page or the Tcrit used in the Grubbs esd method? Formatted in Excel, do we take Tcrit to be T.inv(1- significance value, degree of freedom) or Tinv(significance value, degrees of freedom)?

        Thanks again,

        Reply
        • Hi Kay,
          For ESD you need to perform a two-tailed test and so you use T.INV.2T (or equivalently TINV), For Grubbs test you have a choice of using the one-tailed (T.INV) or two tailed test (T.INV.2T).
          Charles

          Reply
  6. My first post here. Congrats for the very interesting site!
    I would use this test and the Shapiro-Wilks test as one-tailed tests (the values cannot be lower than zero, but can go to +inf). Should I do some modifications?

    Reply
    • Cristiano,
      There is a one-sided version of Grubbs test (ESD restricted to one outlier), but I don’t know of a one-sided version of the ESD test.
      That values can’t go lower than zero does’t mean that a one-sided test is appropriate. E.g. if I generate a sample with 50 elements by using the formula =NORM.S.INV(RAND())+10.0, then probably none of the elements will be below zero, yet the sample is clearly drawn from a normal distribution with mean 10.0. A two sided test is required since there is symmetry around 10. The situation would be different if the data were distributed based only on the right side of a normal distribution.
      Charles

      Reply
      • Hi Charles,
        You’re perfectly right, I said that badly.
        The values are the distance between two points in a 3D space, so I think that a one-sided test should be appropriate. Unfortunately, the distribution is unknown and far from normal (it vaguely resemble a very long-tailed log-normal distribution).
        Currently, the best I can do is to say that a value is an outlier if value – mean > 10 sigma (based on the Chebyshev’s inequality).
        Is there any tool in your Resource Pack that I could use?

        Reply
        • Cristiano,
          If the distribution were log-normal, then you could perform a transformation and then use the outlier tests for the normal distribution.
          Since it doesn’t follow a clear distribution, the resource pack doesn’t contain the tool that you are looking for.
          Charles

          Reply
  7. Hi Charles,
    In the formulas for ESD and Outliers, how do you set k? It would appear that it would automatically provide a list of the outliers based on the number of rows selected but I only ever get 1 outlier when I try it against my data sets and I can clearly see at least 2.

    Reply
  8. Hi Chuck, Thank you for the clear definition and great example.
    In the text above Figure 1 for Example 2:
    “Identify all the outliers in the data set shown in range A18:B28 of Figure 1”
    Do you mean “in range A5:B15 of Figure 1” or am I missing something?

    Reply
    • Hi Jesse,
      Yes, you are correct. Thanks for finding this mistake. I have corrected the error on the webpage.
      I really appreciate your help in making the website better and easier to use.
      Charles

      Reply
  9. What is the smallest number of values for which this statistical approach is valid? I tend to have only 5 or 6 data points, which ESD makes all outliers until I get down to 2 points which obviously stops working. Is there a method for such a small sample?

    Reply
    • Sue,
      Most statistical methods don’t really work that well with such small samples (5 or 6 data points), unless that values are really extreme. E.g. with data such as 2, 3, 6, 8, 400, it is pretty easy to see that 400 is an outlier, but not so clear with 2, 3, 6, 8, 15.
      Charles

      Reply
  10. Should the ESD and OUTLIER functions give the same answer? I am finding that when I have a smaller set of data (n=14) sometimes OUTLIER returns all the values as being outliers, whereas running ESD gives what appears to be the real couple of outliers.

    Reply
        • There isn’t even a clear definition of what is an outlier. Is it related to IQR units from the median or standard deviations from the mean? Is an outlier a naturally occurring thing: e.g. in a sample of 10 perhaps I should consider a data element that is 3 standard deviations from the mean to be an outlier, but should I do the same in a sample of 10,000? In general, I would use both tests to discover potential outliers and then evaluate each to determine whether or not I should do something special with that data element
          Charles

          Reply
  11. Along the lines of the first comment, how is the value for k determined, or even estimated? If running repeated Shapiro-Wilk tests for normality, removing one value at a time, doesn’t this itself give you the number of outliers, without the need for the ESD?

    Reply
    • Rick,
      One way to estimate k is to plot the data.
      You don’t necessarily need to use ESD to find the outliers, but it can be helpful in doing so.
      Charles

      Reply
    • Yvette,
      I am not completely sure that I understand what you are suggesting, but he problem is that you don’t really know the correct value of k to use.
      Charles

      Reply
  12. Excellent site, thanks.

    Just above Figure 4, you have “=ESD(A4:A14,TRUE)”. I think this should be: “=ESD(A5:B15,TRUE)” to be consistent with Example 2 on this page.

    Reply
  13. It appears you are doing a 2-sided test, since you are looking at both minimum and maximum deviation from the mean. As such, shouldn’t the cell E17 in Figure 1 be defined as TINV(E15/2,E16), rather than TINV(E15,E16)?

    Reply

Leave a Comment