KDE Example

Example

Example 1: Create a Kernel Density Estimation (KDE) chart for the data in range A3:A9 of Figure 1 based on the Gaussian kernel and bandwidth of 1.5.

Kernel Density Estimation

Figure 1 – Creating a KDE chart

We will assume that the chart is based on a scatter plot with smoothed lines formed from 51 equally spaced points (i.e. 50 intervals as shown in cell D6 of Figure 1) from x = -6 (cell D4) to x = 10 (cell D5).  This results in spacing of size .32 as calculated by =(D5-D4)/D6 in cell D8.

We also note that the sample size is 6 as calculated by =COUNT(A4:A9) in cell D3 and bandwidth h = 1.5 (cell D7).

The x values of the 51 points are shown in column F (only the first 12 points are shown in Figure 1). Here cell F3 contains the formula =D4 and cell F4 contains the formula =F3+$D$8. Highlighting the range F4:F52 and pressing the key sequence Ctrl-D fills in the other x values.

The corresponding y values are the values of the kernel density estimates f(x) as shown in column G. These are calculated by summing the partial estimates for each of the sample data values, as shown in range G2:M2 which is calculated by the array formula =TRANSPOSE(A4:A9). E.g. f(-6) = 0.000839 (cell G3) is calculated by the formula =SUM(H3:M3)/($D$3*$D$7). The value in cell H3, for example, is calculated in turn by the formula =NORM.S.DIST(($F3-H$2)/$D$7, FALSE) since we are using a Gaussian kernel. We next fill in the remaining cells of Figure 1 by highlighting range H3:M3 and pressing Ctrl-R and then highlighting range G3:M53 and pressing Ctrl-D.

KDE Chart

To create the KDE chart we now highlight the range F2:G53 and then select Insert > Charts|Scatter (choosing the Scatter with Smooth Lines option). The resulting chart is displayed in Figure 2.

KDE chart

Figure 2 – KDE chart at h = 1.5

The chart is highly susceptible to the choice of bandwidth. If you use h = .5 we obtain the KDE chart shown on the left side of Figure 3 while if we use h = 2.734 (which is the Silverman estimate described in KDE Basic Concepts) we get the KDE chart shown on the right side of Figure 3.

KDE chart comparison

Figure 3 – KDE charts

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Silverman, B. W. (1986) Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability, London: Chapman and Hall
https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf

Zucchini, W. (2003) Applied smoothing techniques. Part 1: Kernel density estimation
http://staff.ustc.edu.cn/~zwp/teach/Math-Stat/kernel.pdf

Helwig, N. E. (2017) Density and distribution estimation
http://users.stat.umn.edu/~helwig/notes/den-Notes.pdf

6 thoughts on “KDE Example”

  1. Hello Charles! Thank you so much for your great work! I tried to understand the subject myself for a very long time until I discovered your wonderful site with examples in Excel! On sheet KDE3 (Kernel Diversity Estimation) I got an error, so I corrected the formula for IQR: cell D13 = QUARTILE(A4:A9;3) minus QUARTILE(A4:A9;1). Hope, it’s correct.
    Thanks a lot, with you I felt a little more confident in my ability to master the essence!

    Reply
    • Hello Svea,
      1. I just tested the IQR function and found that it works. Note, however, that this is not a standard Excel function. If you install the free Real Statistics add-in to Excel, then IQR works just like any of the standard Excel functions.
      2. Your workaround is perfect. You could also replace QUARTILE by QUARTILE.EXC.
      Charles

      Reply
    • Hello Purnur,
      You could use Solver, but you need to tell Solver what your goal is (e.g. maximizing some variable). What would your goal be?
      Charles

      Reply

Leave a Comment