Kernel Density Estimation

Basic Concepts

A kernel is a probability density function (pdf) f(x) which is symmetric around the y axis, i.e. f(-x) = f(x).

A kernel density estimation (KDE) is a non-parametric method for estimating the pdf of a random variable based on a random sample using some kernel K and some smoothing parameter (aka bandwidth) h > 0.

Let {x₁, x₂, …, x_n} be a random sample from some distribution whose pdf f(x) is not known. We estimate f(x) as follows:

The results are sensitive to the value chosen for h. Rules for choosing an optimum value for h are complex, but the following are some simple guidelines:

You should use a larger bandwidth value when the sample size is small and the data are sparse. This results in a larger standard deviation; the estimate places more weight on the neighboring data values.
You can use a smaller bandwidth value when the sample size is large and the data are densely packed. This results in a smaller standard deviation; the estimate places more weight on the specific data value and less on the neighboring data values.

Bandwidths that are too small result in a pdf that is too spiky, while bandwidths that are too large result in a pdf that is over-smoothed.

If f(x) follows a normal distribution then an optimal estimate for h is

where s is the standard deviation of the sample.

Silverman’s optimum estimate of h is

where s* = min(s, IQR/1.34) and IQR is the interquartile range of the sample data.

Commonly used kernels

Some commonly used kernels are listed in Figure 1. Note that seven of the kernels restrict the domain to values |u| ≤ 1. The Epanechnikov kernel is the most efficient in some sense that we won’t go into here. The efficiency column in the figure displays the efficiency of each of the kernel choices as a percentage of the efficiency of the Epanechnikov kernel.

Kernel name	Kernel pdf	restriction	efficiency
uniform	K(u) = 1/2	\|u\| ≤ 1	92.9%
triangular	K(u) = 1 – \|u\|	\|u\| ≤ 1	98.6%
biweight	K(u) = 15(1–u²)²/16	\|u\| ≤ 1	99.4%
triweight	K(u) = 35(1–u²)³/32	\|u\| ≤ 1	98.7%
tricube	K(u) = 70(1–\|u\|³)³/81	\|u\| ≤ 1	98.7%
Epanechnikov	K(u) = 3(1–u²)/4	\|u\| ≤ 1	100%
cosine	K(u) = π·cos(1–π·u/2)/4	\|u\| ≤ 1	99.9%
Gaussian	K(u) = exp(-u²/2) /√2π		95.1%
logistic	K(u) = 1/(e^u + e^-u + 2)		88.7%
sigmoid	K(u) = 2/[π(e^u + e^-u)]		84.3%
Silverman	K(u) = exp(-\|u\|/√2) · sin(\|u\|/√2 + π/4)		N/A

Figure 1 – Kernels

References

Silverman, B. W. (1986) Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability, London: Chapman and Hall
https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf

Zucchini, W. (2003) Applied smoothing techniques. Part 1: Kernel density estimation
http://staff.ustc.edu.cn/~zwp/teach/Math-Stat/kernel.pdf

Helwig, N. E. (2017) Density and distribution estimation
http://users.stat.umn.edu/~helwig/notes/den-Notes.pdf

3 thoughts on “Kernel Density Estimation”

Munir Morad (Prof)

April 7, 2022 at 8:13 pm

Many thanks. Most helpful!

Munir
(Prof) M Morad

April 4, 2022 at 7:23 pm

Thanks for these excellent web pages. For the removal of ambiguity, I would be grateful for your help in clarifying a couple of point:

(1)
Re the web page http://www.real-statistics.com/distribution-fitting/kernel-density-estimation
Is the variable u (eg, in k (u)) a reference to the standardised z used in general statistics (ie, z = (x – mean)/standard deviation)?

(2)
Re the web page http://www.real-statistics.com/distribution-fitting/kernel-density-estimation/kde-example
Where exactly can one download the Excel example/sheet. It does not seem to be bundled with other downloadable material.

Once again, many thanks. I should like to refer postgraduate researchers I am in contact with to these pages.
- Charles
  
  April 5, 2022 at 10:37 am
  
  (1) u can be any variable. In some cases, the only restriction is |u| <= 1. (2) It is on the Distribution worksheet (see Distribution Fitting). You can download it from https://www.real-statistics.com/free-download/real-statistics-examples-workbook/
  Charles

Basic Concepts

Commonly used kernels

Other Topics

References

3 thoughts on “Kernel Density Estimation”

Leave a Comment Cancel reply