Histograms

Basic Concepts

A histogram is a graphical representation of the output from the FREQUENCY function described in Frequency Tables. 

Example 1: Create a histogram for the data and bin selection for Example 1 from Frequency Tables.

We start by replicating the data and bin section for Example 1 in Figure 1.

Data example 1

Figure 1 – Data for Example 1

You can use Excel’s chart tool to graph the data in Figure 1, or, alternatively, you can use the Histogram data analysis tool to accomplish this directly, as described next.

Data Analysis Tool

Excel Data Analysis Tool: To use Excel’s Histogram data analysis tool, you must first establish a bin array (as for the FREQUENCY function described in Frequency Tables) and then enter Data > Analysis|Data Analysis to select the Histogram data analysis tool. In the dialog box that is displayed (see Figure 2), specify the input data (Input Range) and bin array (Bin Range). You can optionally include the labels for these ranges (in which case you check the Labels check box).

Since you want to see the histogram, you should also check the Chart Output option. 

Histogram dialog box

Figure 2 – Dialog box for the Histogram data analysis tool

For Example 1, the Input Range is A4:B14 and the Bin Range is D4:D7 (with the Labels check box unchecked). Upon clicking the OK button, the output is displayed as shown in Figure 3.

Histogram tool output

Figure 3 – Histogram data analysis tool output

Caution

You must exercise caution when creating histograms to present the data clearly and accurately. For most purposes, the intervals must be equal in size (except for an unbounded first and/or last interval). Otherwise, a distorted picture of the data may be presented.

To avoid this problem, you should usually use equally spaced intervals. This approach is illustrated in Example 4 of Frequency Tables using Real Statistics’ FREQTABLE function. Alternatively, you can use the Real Statistics Histogram with Normal Curve Overlay data analysis tool.

Real Statistics Data Analysis Tool

Real Statistics Data Analysis Tool: The Histogram with Normal Curve Overlay data analysis tool provided in the Real Statistics Resource Pack can be used to create a frequency table and histogram as illustrated in the following example.

Example 2: Create a frequency table and histogram for the 22 data elements in the range A4:B14 of Figure 1 based on bins of size 15.

Enter Ctrl-m and select the Histogram with Normal Curve Overlay option. Fill in the dialog box that appears as shown in Figure 4.

Histogram dialog box

Figure 4 – Dialog box for Histogram data analysis tool 

After clicking on the OK button, the output as shown in Figure 5 is displayed.

Histogram chart

Figure 5 – Frequency Table and Histogram

Note that if 100 were inserted in the Maximum bin value (or blank) field of Figure 4 then the output would be as shown in Figure 6.

Histogram frequency table

Figure 6 – Frequency Table and Histogram (revised)

Curve Option

If the Curve chart option were chosen as shown in the dialog box in Figure 4 then the output for Example 2 would appear as shown in Figure 7.

Smoothed Line Curve

Figure 7 – Smoothed Line Curve

The optimum number of bins

What is the ideal number of bins or bin width? Historically, Sturgis’s Rule was used to answer this question.

Sturgis' Rule

This approach works pretty well for small n (up to about 200). Better approaches were supplied by Scott, namely

Scott's Rule

where s = the standard deviation, or Freedman and Diaconis, namely

Freedman-Diaconis Rule

where IQR is the interquartile range. The latter is probably more appropriate for data that is not normally distributed.

For Example 1, since n = 22, s = 22.43 and IQR = 21, the appropriate bin width is 28 per Scott or 15 per Freedman and Diaconis; or about 5.5 bins per Sturgis.

Normal Curve Overlay

The histogram (or straight-line chart or smoothed line curve) produced by this data analysis tool can also be overlaid with a normal curve to help determine whether the data is normally distributed. See Histogram with Normal Curve Overlay for more details.

You can also produce a frequency table and histogram of the type described in Example 3 of Discrete Probability Distributions (i.e. without specifying any bins) via the Histogram with Normal Curve Overlay data analysis tool. In this case, you would leave the Bin size field blank in the dialog box shown in Figure 4.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2012) Histogram
https://en.wikipedia.org/wiki/Histogram

Hyndman, R. J. (1995) The problem with Sturges’ rule for constructing histograms
https://robjhyndman.com/papers/sturges.pdf

Wand, M. P. (1997) Data-based choice of histogram bin width. The American Statistician, Vol. 51, No. 1, pp. 59-64
https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/WandBinWidth.pdf

Scott, D. W. (1979) On optimal and data-based histograms. Biometrika, Vol. 66, No. 3., pp. 605-610.
https://www.stat.cmu.edu/~rnugent/PCMI2016/papers/ScottBandwidth.pdf

26 thoughts on “Histograms”

  1. I noticed that you are using Excels Countif function to generate the frequency values. I believe this is too slow compared to the built in Frequency Array Function. I have tested the histogram on a set of 33300 wind speed measurements and the histogram was quite slow. By contrast it is reasonably fast with the frequency array function, which can be inserted with VBA as well.

    It would be nice to have an overlay with other distribution fits as well, not only the normal distribution. E.g. the wind speed data follow approximately a Weibull distribution. At present I do this manually by calculating the probability density function and scaling it with the factor (sample size)*(bin size) based on the observation that the area under the histogram is Sum(ni * (bin size))=Sum(ni)*(bin size)=(sample size)*(bin size).

    Reply
    • I have to correct my statement. The histogram function is not too slow. It only slows down when I let the program select the bin size for the 33.300 wind speed measurements by itself. If I select a meaningful bin size such as 0.2 m/s the histogram generation is quite fast.

      Reply
  2. Hello Dr. Zaiontz,
    I have a group of patients who completed a questionnaire scoring from zero to 100 points, zero being a valid score.
    The Resource Pack (7.0.6, Excel 2010, Win) builds a histogram, that looks similar to the native Excel histogram.
    However, with the Normal Curve Overlay switched on, the first column (score 0) moves to the negative side of the X axis and I cannot find a way to correct this. The design of the chart also changes (greatly improves:) ).
    Please advise.
    Svetoslav

    Reply
    • Dr. Zaiontz,
      I understood why the design changes. I believe it is a great idea.
      I cannot solve the other issue, though.
      Best regards,
      Svetoslav

      Reply
        • Dr. Zaiontz,
          I have a group of patients who completed a questionnaire and some of them had the minimum score of zero points which is a valid score.
          My problem is the following:
          When I try to build a histogram with normal curve overlay, the first column of the histogram that represents the group of patients with zero points is situated to the left of the Y axis, e.g. from -10 to zero for bin size of 10.
          With the basic histogram (without the normal curve overlay), this does not happen, i.e. all the columns of the histogram are to the right of the Y axis.
          Thank you for your time and patience!
          Svetoslav

          Reply
          • Hello Svetoslav,
            Are you saying that you prefer that the first bin goes from zero to 10?
            If you email me an Excel file with your data and results, I will try to help you resolve this issue.
            Charles

  3. Hi,

    How can I show % of frequency count in the histogram, in labels
    Not just changing the count to % but showing percentage-wise frequency distribution?

    I hope I have framed the question properly

    thanks

    Reply
  4. Ups sorry I though the last messages were at the end and I have repeat the question… Thanks. I will wait for the book

    Reply
  5. Thanks very much Charles,

    I am using your package almost every day for my phd… Were I could buy your book? I like real books (printed books)

    Thanks

    Reply
    • Hi Gabriel,
      Glad that you are getting value from the Real Statistics software.
      The book is still not published. I keep getting distracted and so have not yet finished proof-reading it. I hope to finish it later this year.
      Charles

      Reply
  6. I am very happy with you app. It would be fantastic creating kernell density plot in excel

    Thanks very much Charles

    Reply
  7. Hello my name is Gabriel Delgado,

    I am from Granada. I would like to say that I am very happy with Real Statistic using excel. Now I am creating confidence ellipses with this App. It would be fantastic to crete Kernell Density 2D plot if possible…

    Thanks very much Charles

    Reply
  8. Is the upper limit of a bin included in the bin? For example, if my first bin’s upper limit is 39 and I have 39 in my data, will Excel show this point in the first bin/vertical bar OR will it include this in the next bin/vertical bar?

    Reply
    • Azfer,
      The upper limit of a bin is included in the bin. In your example 39 will be counted in the first bin.
      I suggest that you simply create some data and use the Histogram data analysis tool to see that this is the case.
      Charles

      Reply

Leave a Comment