Kendall-Theil-Sen Regression

Basic Concepts

The Kendall-Theil-Sen estimator is a non-parametric method for fitting a line to a set of points (x1, y1), …, (xn, yn). It is a robust method in that it provides a better fitting line when the data contains outliers compared to ordinary least-squares regression. It also doesn’t require that the residuals are normally distributed. The key assumption is that there is a linear relation between the x and y data.

The approach consists of two parts. First, the slope is estimated and then the intercept is estimated based on this slope. The approach is similar to that described in Sen’s Slope for time series. In fact, if the time series is y1, y2, …, yn, then Sen’s slope is the Kendall-Theil-Sen slope estimator for the points (1, y1), …, (n, yn).

Methodology

The Kendall-Theil-Sen estimate of the slope m is defined as

Sen's slope

We provide two approaches for estimating the corresponding intercept b.

Approach 1

Intercept estimate (approach 2)

where x-tilde is the median of the x values and y-tilde is the median of the y values (Conover).

Approach 2

Intercept estimate (approach 1)

Hypothesis Testing

The test to determine whether the slope coefficient is significantly different from zero is identical to the test to determine whether Kendall’s tau is significantly different from zero.

Confidence Interval

There are several ways of calculating the confidence interval of the slope, namely

Approach 1: Let N = the number of elements in S, zcrit = the 1-α/2 critical value for the normal distribution, and define

Standard error, confidence interval

Then the lower and upper bounds of the slope coefficient are

Confidence interval

where mh = the hth smallest element in S.

This is very similar to the way of estimating the confidence interval for Sen’s slope of a time series. In fact, we can refine the above estimate by using the tie correction for the standard error employed Sen’s slope (and the Mann-Kendall test) based on either the x or y values.

Approach 2: We can use a bootstrap approach to find a confidence interval based on the values in S.

Approach 3: We can use a bootstrap approach to find the standard error based on the values in S. Then we use m ± sezcrit as the confidence interval.

Approach 4 and 5: We can use jackknifing instead of bootstrapping in approaches 2 and 3.

Example

Example 1: Create a linear regression model for the data in columns A and B of Figure 1 using the Kendall-Theil-Sen estimate.

Figure 1 also displays all the pairwise slopes. E.g. cell F5 contains the formula

=IF(ROW(F5)-ROW(F$4)>COLUMN(F5)-COLUMN($F5),IF($D5<>F$2,($E5-F$3)/($D5-F$2),””),””)

Non-parametric regression

Figure 1 – Slope for the regression model

Figure 2 displays the regression model using the formulas described above. We see that the estimate for the slope coefficient is -.62791 (cell W12). A 95% confidence interval for this estimate is (-1.1667, -.26316), as shown in W13:W14. A slightly different estimate is shown in W17:W18 using Real Statistics’ SMALLExact function.

KTS regression results

Figure 2 – Regression model

Figure 2 also provides two estimates for the intercept (cells AB12 and AB16). We also provide 95% confidence intervals using the two approaches. We see that the regression model is

y = -.62791x + 86.93023

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following function for the x data in Rx and y data in Ry.

KTSReg(Rx, Ry, lab, mopt, bopt, alpha): returns a column array with the values: slope, s.e. for the slope, intercept, and the confidence intervals of the slope and intercept

Rx and Ry must be column arrays or ranges with the same number of rows and no missing data or headings. If lab = TRUE (default FALSE) a column of labels is appended to the output. alpha is the significance level (default .05). 

If mopt = TRUE (default), then SMALL is used to estimate the confidence interval for the slope; otherwise, SMALLExact is used. If bopt = TRUE (default), then the intercept is estimated using Approach 1; otherwise, Approach 2 is used.

We can use this function to get the results for Example 1, as shown in Figure 3.

KTSReg function examples

Figure 3 – Regression using KTSReg function

The formulas used to generate these three versions, from left to right, are

=KTSReg(A4:A18,B4:B18,TRUE)

=KTSReg(A4:A18,B4:B18,TRUE,,FALSE)

=KTSReg(A4:A18,B4:B18,TRUE,FALSE,FALSE)

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Kvam, P. H., Brani Vidakovic, B. (2007) Nonparametric statistics with applications to science and engineering. Wiley
https://maktab-sms.ir/Uploads/Ebooks/9339037c-8955-4792-afb7-b003c6df9ce7.pdf

Sen, P. K. (1968) Estimates of the regression coefficient based on Kendall’s tau
https://www.pacificclimate.org/~wernera/zyp/Sen%201968%20JASA.pdf

Granato, GE (2006) Kendall-Theil Robust Line (KTRLine—version 1.0)—A Visual Basic program for calculating and graphing robust nonparametric estimates of linear-regression coefficients between two continuous variables
https://pubs.usgs.gov/tm/2006/tm4a7/

Farooqi, A., (2019) A comparative study of Kendall-Theil Sen, Siegel vs quantile regression with outliers
https://digitalcommons.wayne.edu/cgi/viewcontent.cgi?article=3351&context=oa_dissertations

Wikipedia (2023) Theil-Sen estimator
https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator

6 thoughts on “Kendall-Theil-Sen Regression”

  1. Hello Charles,

    Thanks for all your explanations! You really help me so much . Just a little thing with this exercise of Kendalss regression. In the Excel sheet the formula to estimate the upper interval with the Approach 1 do not have + 1 in the denominator, and you formula has it.

    Thans for everything!

    Reply

Leave a Comment