Creating Box Plots with Outliers in Excel

Basic Concepts

The procedure for manually creating a box plot with outliers (see Box Plots with Outliers) is similar to that described in Special Charting Capabilities. One key difference is that instead of ending the top whisker at the maximum data value, it ends at the largest data value less than or equal to Q3 + 1.5*IQR. Similarly, the bottom whisker ends at the smallest data value greater than or equal to Q1 – 1.5*IQR. The other difference is that we need to manually add a small circle or each data value greater than Q3 + 1.5*IQR or less than Q1 – 1.5*IQR. We also add an × at the mean data value.

Examples

Example 1: Repeat Example 2 of Special Charting Capabilities, showing outliers.

The result is shown in Figure 1. We now show how to construct this output manually using standard Excel capabilities.

Boxplots with Outliers Excel

Figure 1 – Box Plots with Outliers

To create this box plot manually, you need to first create the values in range F12:F17. The Q1, Median, Q3 and Mean values for Brand A in the range F12:F17 are calculated by the formulas =QUARTILE(A4:A13,1), =MEDIAN(A4,A13), =QUARTILE(A4:A13,3) and =AVERAGE(A4:A13). We use the following array formulas to calculate the values for Lower (i.e. lower whisker) and Upper (i.e. upper whisker) for Brand A.

=MIN(IF(A4:A13>=F12-1.5*(F14-F12),A4:A13,””))

=MAX(IF(A4:A13<=F14+1.5*(F14-F12),A4:A13,””))

Keep in mind that these are array formulas and so you need to press Ctrl-Shft-Enter and not just Enter.

Once you have entered all these formulas, highlight the range F12:H17 and press Ctrl-R. Next, place the formula =F12 in cell F6, highlight range F6:H6 and press Ctrl-R. Finally place the formula =MAX(F13-F12,0) in cell F7, highlight the range F7:H10 and press the key sequences Ctrl-R and Ctrl-D. This fills in all the values in the range F6:H10.

You now follow the same steps as used in Example 1 or 2 of Special Charting Capabilities, substituting Lower and Upper for Min and Max. The result will be as shown in Figure 1, except that the o and × symbols won’t yet be displayed for outliers and the mean values. We now show how to add each of these individually.

Adding Markers for the Means

We start by adding the means for each of the brands. Click on the chart in Figure 1 and select Chart Tools|Design > Data|Select Data. This will bring up the dialog box shown in Figure 2.

Select Data Source dialog

Figure 2 – Select Data Source

Click on the Add button (on the left side of the dialog box). Fill in the dialog box that appears as shown in Figure 3.

Add series Excel charts

Figure 3 – Add new series for the means

When you press the OK button, the chart changes to that shown in Figure 4.

Modified box plot

Figure 4 – Modified chart

We get this result because Excel interprets the added means as stacked bar charts. Thus, we need to tell Excel that this added series should be interpreted instead as a scatter plot. We do this by first clicking on the OK button of the Select Data Source dialog box (see Figure 2) that reappears to close this dialog box, and then right-clicking on any of the blue bars shown in Figure 4. Next, we select Change Series Chart Type … from the menu that appears. The dialog box shown in Figure 5 will now appear.

Change Chart Type

Figure 5 – Change Chart Type dialog box

As we can see from Figure 5, the means are treated as a Stacked Column. Now click on the dropdown menu for m (headed by Stacked Column) and select the first Scatter Plot option. Then press the OK button on the Change Chart Type dialog box.

The chart changes to the more acceptable format shown in Figure 6.

Box plots with means

Figure 6 – Box Plot after adding brand means

Identifying Outliers

We now proceed to add the outliers to the chart, but first, we need to identify the outliers. One approach for doing this is shown in Figure 7.

Identifying outliers

Figure 7 – Identifying outliers

We place the formula =IF(A4>F$15,A4,IF(A4<F$11,A4,””)) in cell Q4, highlight the range Q4:S13 and press Ctrl-R and Ctrl-D. There are only two outliers, namely -300 and 1850 for Brand B.

Adding Markers for Outliers

We now proceed to add these two outliers to the chart, one by one, using a procedure similar to that used to add the mean values. To add 1850, click on the chart in Figure 6 and select Chart Tools|Design > Data|Select Data. This will bring up the dialog box shown in Figure 2. Once again, click on the Add button. Fill in the dialog box that appears as shown in Figure 8.

Add series for outlier

Figure 8 – Add new series for one outlier

Note that this time the default chart is a scatter chart (the last chart type selected) and so we are prompted for both X and Y values (unlike the prompt in Figure 3). Since the outlier 1850 (shown in cell R10 of Figure 7) is in Brand B, we must insert 2 in the Series X values field.

We add the outlier -300 in a similar fashion. The result is shown in Figure 9.

Box and Whiskers outliers

Figure 9 – Box Plot with Outliers

Changing the Outlier Markers

The last step is to change the marker symbol used to indicate the outliers. We show how to do this in Excel 2013 and later versions. The approach is similar in earlier versions of Excel.

Click on one of the outlier marker symbols and then select Chart Tools|Format > Current Selection|Format Selection. Click on the Fill & Line icon and then select the Marker option.  See Figure 10. Next, click on Built-in from the Marker Options and choose the Type that you like. You can also choose the Color (lower right-hand corner of Figure 10).

Change Marker Type Excel

Figure 10 – Changing the Marker Type

Now repeat the above steps for the other outlier.

When Negative Data is Not a Problem

Note that despite the fact that there is a negative value, namely the outlier -300 for Brand B (cell B11 of Figure 1), no adjustment for the negative value is necessary when using the Box Plot with Outliers option of the Real Statistics Descriptive Statistics and Normality data analysis tool. The reason for this is that none of the lower whiskers go into negative territory.

This can be seen from cell F18 which contains the formula =MIN(0,F11:H11). As long as this value is zero and not negative, then no adjustment is necessary.

Handling Negative Data

Example 2: Use the Box Plot with Outliers option of the Real Statistics Descriptive Statistics and Normality data analysis tool to create the box plots for the data in range A6:A16 of Figure 11.

If we repeat all the steps described for Example 1, this time we see that the lower whisker for Brand B does go into negative territory. This is evident from the fact that cell F18 of Figure 11 contains the negative value -50.

Despite this, no change is required to the manual procedure described for Example 1. When we use the Real Statistics Descriptive Statistics and Normality data analysis tool, the data analysis tool makes an adjustment to handle the negative value in cell F18 of Figure 11.

Adjusted box plot

Figure 11 – Box Plot with Outliers, negative adjustment

The main change that the Real Statistics data analysis tool makes when implementing the procedure described for Example 1 is to use the following formula in cell F5, =F11-$F18. Upon highlighting range F5:H5 and pressing Ctrl-R, a similar adjustment will be made to the Lower values for all three brands.

The data analysis tool also adjusts the mean and outlier values by –F18, i.e. by adding 50. E.g. Figure 12 shows how to identify the values to use for outliers. This is done by inserting the formula =IF(A4>F$15,A4-$F$18,IF(A4<F$11,A4-$F$18,””)) in cell R7, highlighting the range Q4:S13 and pressing Ctrl-R and Ctrl-D.

Negative adjustments for outliers

Figure 12 – Negative adjustment for outliers

This produces the three outliers, 50 for Brand A and 1850 and -300 for Brand B, each augmented by 50.

References

Microsoft (2016) Create a box and whiskers chart
https://support.microsoft.com/en-us/office/create-a-box-and-whisker-chart-62f4219f-db4b-4754-aca8-4743f6190f0d

Peltier, J. (2011) Excel box and whisker diagram (box plots)
https://peltiertech.com/excel-box-and-whisker-diagrams-box-plots/

6 thoughts on “Creating Box Plots with Outliers in Excel”

  1. Hello Charles,
    Please, Real-Statistics-Examples-Basics.xlsx
    This demo e-file
    Box Plot 4 in this tab
    Is it possible to automate the output of outliers worth charting?
    Or do you need to do it manually?
    Because there are quite a lot of outliers, it takes a lot of time to manually make them one by one.
    thanks for your reply
    Happy new year and good health

    Reply
  2. There are two vertical axes on my box plots. One is the data, but what is the other one for? Can I remove it so that the box plots look like your examples?

    Reply
    • Mark,
      In the case where there are negative data items, the approach used to create the box plot required that the data be transformed by the addition of some constant value to remove the effects of the negative data. Two vertical axes are used: one based on the original data and the other based on the transformed data. You may remove one of these axes.
      Charles

      Reply
        • Hello Herb,
          This webpage describes how to create a Box Plot with both positive and negative items.
          You have two choices: use the approach described on this webpage or if you have Excel 2016 or later you can use the Excel’s built-in Box Plot tool.

          Charles

          Reply

Leave a Comment