LOESS Regression

Basic Concepts

LOESS (locally estimated scatterplot smoothing) regression combines aspects of weighted moving average smoothing with weighted linear or polynomial regression. LOESS is also called LOWESS, which stands for locally weighted scatterplot smoothing.

The parameters that determine this type of regression are (1) the degree of the polynomial (usually linear, sometimes quadratic), (2) the span (equivalent to the number of lags in weighted moving average smoothing), and (3) the weighting function. Usually, the tricube function is used as the weighting function, namely

w(x) = (1 – |x|3)3

for -1 < x < 1. We can define w(x) = 0 outside this range.

The span (local subinterval size) takes an integer value. For each observed data pair, a regression model is made based on a span number of points. The higher the value, the more smoothing. Often a smoothing parameter q between 0 and 1 is used. In this case, the span is equal to nq (or the next largest integer) where n is the sample size.

Since a polynomial of degree deg requires at least deg+1 points for a fit, the smoothing parameter must take a value between (deg+1)/n and 1. Usually, a value between .25 and .50 is used as the smoothing parameter.

Methodology

The algorithm works as follows:

1.  Sort the data points p1 = (x1,y1), …, pn = (xn,yn) in ascending order based on the x values.

2. We use the following distance measure for each data pair of data points

d(pi, pj) = |xi xj|

3. For each data point pi starting with p1, identify the m contiguous points surrounding pi that are closest to pi (based on this distance measure) where m = the span.

4. For each of the m points pj in the local subinterval around pi, you now calculate the scaled distance measure as follows where the denominator contains the largest distance from pi of points pk in the local subinterval.

Scaled distance

5. For each point in the local subinterval around pi, you now calculate its weight as follows

w(pj) = (1 – |d*(pj)|3)3

6. You now perform a weighted linear regression using the points in the subinterval and the weights calculated above.

7. The resulting LOESS regression value at xi is the y value at xi from this regression. Thus, if the regression parameters are b0 (intercept) and b1 (slope) then the LOESS regression value at xi is b0 + b1⋅ xi. If you have set degrees to 2 (quadratic regression), then the LOESS regression value at xi is b0 + b1⋅ xi + b2⋅ xi2.

Note that if you have n data points, then this algorithm will produce n weighted linear regressions, one for each point. There are iterative versions of LOESS and multivariate versions, but we won’t consider those here.

Example

Example 1: Create LOESS regression for the data in range B2:C22 of Figure 1. This is the same data as used in the NIST example (see references below).

LOESS regression example

Figure 1 – LOESS Regression

The fitted y values for the 21 data points are shown in column E based on LOESS regression with a span of 7 points. The right side of the figure displays a chart containing the observed data points (in blue) along with the fitted values (in red). In LOESS Regression using Excel, we show you how to calculate these fitted values. We also show how to create the chart on the right side of the figure.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

NIST (2012) LOESS (aka LOWESS)
https://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm

Cleveland, W.S. (1979) Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, Vol. 74, pp. 829-836.
https://sites.stat.washington.edu/courses/stat527/s13/readings/Cleveland_JASA_1979.pdf

NIST (2012) Example of LOESS computations
https://www.itl.nist.gov/div898/handbook/pmd/section1/dep/dep144.htm

Leave a Comment