Box Plots with Outliers

Excel Box and Whiskers Chart

Starting with Excel 2016 Microsoft added a Box and Whiskers chart capability to Excel. To access this capability for Example 1 of Creating Box Plots in Excel, highlight the data range A2:C11 (from Figure 1) and select Insert > Charts|Statistical > Box and Whiskers. The chart shown on the right side of Figure 1 will appear.

Box plot Excel 2016

Figure 1 – Excel’s Box and Whiskers chart

You can add a legend as well as chart and axis titles as usual. The box part of the chart is as described in Creating Box Plots in Excel. The whiskers extend up from the top of the box to the largest data element that is less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the smallest data element that is larger than 1.5 times the IQR. Values outside this range are considered to be outliers and are represented by a small filled-in circle.

The boundaries of the box and whiskers are calculated as shown in Figure 2. The only outlier is the value 1850 for Brand B, which is higher than the upper whisker, and so is shown as a dot.

Box and Whiskers formulas

Figure 2 – Formulas for the Box Plot

Note that we could also use the array formula

=MAX(IF(C2:C11<=H7,C2:C11,MIN(C2:C11)))

to calculate the value in cell H9, and the array formula

=MIN(IF(C2:C11>=H8,C2:C11,MAX(C2:C11)))

to calculate a value for cell H10. In fact, since the Excel Box Plot is only available in Excel 2016 or later versions of Excel, we can also use the (non-array) formulas =MAXIFS(C2:C11,”<=”&H7) and =MINIFS(C2:C11,”>=”&H8) that were introduced in Excel 2016.

Real Statistics Data Analysis Tool

The Real Statistics Resource Pack also provides a way of generating box plots with outliers (even for versions of Excel prior to Excel 2016). To produce such a box plot, proceed as in Example 1 of Creating Box Plots in Excel, except that you need to select the Box Plots with Outliers option of the Descriptive Statistics and Normality data analysis tool. The output for Example 1 of Creating Box Plots in Excel is shown in Figure 3.

Box Plots with Outliers

Figure 3 – Output from Box Plots with Outliers tool

As you can see, the output is similar to that shown in Figure 1. Note too that the Outlier Multiplier is not fixed at 1.5 but can be set to another value by the user (in the dialog box for the Descriptive Statistics and Normality data analysis tool).

Data Analysis Tools Details

The selected  Outlier Multiplier is shown in cell F2 of the output displayed in Figure 3. This value is used in calculating the Min and Max values (which are the values at the bottom of the lower whisker and the top of the upper whisker). E.g. cell F12 contains the array formula 

=MIN(IF(ISBLANK(A4:A13),””,IF(A4:A13>=F13-$F2*(F15-F13),A4:A13,””)))

and cell F16 contains the formula

=MAX(IF(ISBLANK(A4:A13),””,IF(A4:A13<=F15+$F2*(F15-F13),A4:A13,””))).

If the Percentage option is set on the Configuration dialog box, then you should enter a value 100 times the desired value in the Outlier Multiplier field; e.g. enter 150 if you want a 1.5 outlier multiplier factor. If you leave this field blank, the outlier multiplier factor defaults to 2.2.

Negative numbers are handled like that for Box Plots without Outliers (often using a second y-axis). Keep in mind, though, that a second y-axis is only employed when the lower whisker of at least one of the box plots is negative. If some outlier is negative but none of the lower whiskers are negative, then a second y-axis is not needed.

See Creating Box Plots with Outliers in Excel for how to create a box plot with outliers manually, using only Excel charting capabilities. It also addresses issues that arise when some of the data is negative.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Microsoft (2016) Create a box and whiskers chart
https://support.microsoft.com/en-us/office/create-a-box-and-whisker-chart-62f4219f-db4b-4754-aca8-4743f6190f0d

29 thoughts on “Box Plots with Outliers”

  1. Hi,
    thanks this is very useful, a short question, what is the minimum number of data for Excel to find outliers in a box plot? I have a set of 5 data and it is clear that one of them is an outlier, but the box plot will not indicate this, so I suppose this is because there is a minimum number of (valid) data needed. Could that be?

    Reply
    • Hi Hilde,
      There is no minimum number of data elements needed. It might be the case that an element needs to be more of an outlier in a small sample, but I am not sure even this is true.
      In any case, if you tell me what the 5 data elements are, I will be happy to give my opinion as to whether there is an outlier.
      Charles

      Reply
  2. Hi

    I have an old vs new method that I want to compare with box with outliers.
    However I have very many datasets to review. (50 plus lines left)

    Is there a way to make multiple separate box plots at once?
    Instead of making each individually for each line?

    Thanks
    Saga

    Reply
  3. I am trying to make a horizontal box plot but whenever I try to switch the columns, the outliers and means are in the wrong place. Is there a way to make those values be on the bars in the correct place?

    Reply
  4. This RealStatistics is a great Excell Add-on!.
    Thank you.
    I would like to define the whiskers as 1.96*standard deviation of the data. i.e. the data is normally distributed and I want to show the 95% confidence limits and the outliers.
    Please can you describe in more detail how to do this?
    Thank you.

    Reply
    • Hello Chris,
      I have not tried to do this, but I guess you can use the same approach as I suggested for the standard box plots, but now you need to decide what you will use in place of the parameters used for the standard box plot. Presumably, you will use the mean instead of the median and mean – 1.96*sd instead of “min” (actually a modified min) and mean + 1.96*sd in place of “max” (this specifies the whiskers). Presumably anything smaller than mean – 1.96*sd or larger than mean + 1.96*sd would be considered to be an outlier. You would still need to specify the end points of the box (perhaps mean +/- 1*sd).
      Charles

      Reply
  5. I’ve created a box plot with 24 values ranging from 1 to 5 (1 rating of 1, 3 ratings of 2, 14 ratings of 3, 5 ratings of 4 and 1 rating of 5). The lowest value of 1 only shows up as an outlier if I click “Show Inner Point”. Why is that?

    Note: this issue only happens if I am graphing multiple series on the same graph. If I create a stand alone box chart the value of 1 appears without having to click on “Show Inner Point”

    Reply
  6. Hi Charles,
    A quick question about outliers:
    When I ask for a box plot with outliers, the outliers list often includes one or more zero values (sometimes many more–76 in the output that inspired me to ask this question) even though the data set in question has a minimum value much greater than zero.

    Is this some sort of bug? Or, are these being generated correctly by some stats operation that I’m not familiar with? And then, of course, there is always user error! I can just delete these values, but don’t want to if it isn’t appropriate. Please advise when convenient. FYI, I am using Excel 2013 on a Windows 10 machine.

    Thanks in advance for your attention to this question. This add on and your website to support it are really excellent!

    Regards,
    Chris V.

    Reply
    • Chris,
      Glad that you like the website.
      There is no reason why zero couldn’t be an outlier. E.g. in a sample with one data element with zero value and 99 data elements with value 1,000,000, then zero could certainly be considered to be an outlier.
      In any case, if you send me an Excel file with your data, I can check to see whether the software is making an error.
      Charles

      Reply
        • Chris,
          Thanks for sending me the file.
          I see that the zero correspond to the blank elements in data range. E.g. in the range A1:D440 containing your data, the last 4 elements in the D column are blank. This generated 4 outliers with value zero. This is an error. I will fix this in the next release of the software. Fortunately, all the other calculations are correct, and so you should just ignore these false outliers.
          Charles

          Reply
  7. Charles,

    Thank you for the explanation on the range for the whiskers! Everywhere I searched I got a definition that they showed the upper and lower data point, yet I’d have data points plotted outside the whiskers. You are the first one that helped make sense of this to me.

    I’m a novice in statistics and would like to be able to explain the whisker range without using statistical terms. Is there a way to do this?

    Reply
    • Brian,
      Glad that you got value from the website.
      I think that explaining the whisker range is equivalent to explaining “outlier” without using statistical terminology. E.g. the top of the whisker is located at the largest data item that is not an outlier (or at the place where outliers start). This just shifts the issue to how you can describe an outlier in non-statistical terms. How about, “an outlier is an unexpectedly large or small data element”?
      Charles

      Reply
  8. Hi Charles,
    About the definition of the whisker lines:
    I agree with the formulas in Figure 2 (and also with the array formulas) but, even if (or just because?) I am not a native English speaker, the sentence

    “The whiskers extend up from the top of the box to the smallest data element that is less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the largest data element that is larger than 1.5 times the IQR.”

    does not sound pretty right to me. Shouldn’t be instead:

    “The whiskers extend up from the top of the box to the nth smallest data element where n is the number of elements that are less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the nth largest data element with n being the number of elements that are larger than 1.5 times the IQR.”,

    if one would trace the precise syntax of the formula?
    Or even more simply:

    “The whiskers extend up from the top of the box to the LARGEST data element that is less than or equal to 1.5 times the interquartile range (IQR) and down from the bottom of the box to the SMALLEST data element that is larger than 1.5 times the IQR.”

    or something alike… just because, strictly speaking, “the smallest data element that is less than or equal to 1.5 times the interquartile range (IQR)” is simply the smallest data element.

    In any case, thank you for your work!

    Best regards

    Federico

    Reply
  9. Thanks for your clear definition of the whisker lines! So many other places just say that the whiskers extend to the minimum and maximum values of the data set, but this is obviously not the case when outliers are present and are shown above and/or below the whiskers.

    Reply
    • Godfrey,
      It really depends on how you define “outlier”. If an outlier is a data value that is much higher or lower than the mean or median, then in a large sample you should expect to have some outliers, and typical statistical tests should be able to deal with these. If an outlier is a data value that is much higher or lower than the other data values, then this may violate one of the assumptions of the statistical test (e.g. the homogeneity of variance assumption for ANOVA).
      In any case, you shouldn’t automatically resort to triming or winsorizing the data. This may be appropriate sometimes, but then you are essentially testing the trimmed or winsorized data and not the original data. Sometimes it is best to report the results of your test both with the original data and then with the trimmed or winsorized data (or simply with the original data with the outlier(s) removed).
      Generally, the first thing you should do is to try to determine whether the outlier occurs because this is a reasonable data value (in which case you don’t want to remove it) or because some error has been made (miscopying the value, leaving out the decimal point, faulty measurement device, etc.), in which case you should either correct the error or leave out this data element(s).
      As usual in statistical analysis, judgement is required.
      Charles

      Reply

Leave a Comment