Ordinary least squares (OLS) regression produces regression coefficients that are unbiased estimators of the corresponding population coefficients with the least variance. This is the Gauss-Markov Theorem. In most situations, this is exactly what we want.
However, there may be a model with less variance (i.e. smaller SSE), but at the cost of added bias. This may be desirable, for example, when some of the data is highly correlated. This occurs in the following situations:
- There are many independent variables, especially when there are more variables than observations.
- Data is close to multicollinearity, in which case small changes to X can result in large changes to the regression coefficients.
Here the OLS model over-fits the data, i.e. it captures the data well but is not so good at forecasting based on new data. In these cases, Ridge and LASSO Regression can produce better models by reducing the variance at the expense of adding bias.
Topics
- Ridge Regression Basic Concepts
- Ridge Regression Example
- Estimating Lambda
- Real Statistics Data Analysis Tool
- Ridge Regression Predictions
- LASSO Regression
References
bquanttrading (2015) Ridge regression in Excel/VBA
https://asmquantmacro.com/2015/12/11/ridge-regression-in-excelvba/
Marquardt, D. W. and Snee, R. D. (1975) Ridge regression in practice. The American Statistician
https://typeset.io/papers/ridge-regression-in-practice-4esavyij1s
PennState (2018) Ridge regression. Applied Data Mining and Statistical Learning
https://online.stat.psu.edu/stat857/node/155/