Weighted least squares (WLS) regression is an extension of ordinary (OLS) least-squares regression by the use of weights. Generally, weighted least squares regression is used when the homogeneous variance assumption of OLS regression is not met (aka heteroscedasticity or heteroskedasticity).
Topics
- Basic concepts of weighted regression
- Using weighted regression to deal with heteroscedasticity
- Real Statistics support for weighted regression
- Weighted regression based on OLS regression through the origin
Reference
Wikipedia (2015) Weighted least squares
https://en.wikipedia.org/wiki/Weighted_least_squares
Hey Charles,
Thanks for all the hard work you’ve put in! Is it possible to weight by the independent variables as opposed to each row of data points? For example, if I wanted to assign weight of .2, .3, and .5, to variables x1, x2, and x3, respectively, would I be able to do so using your add-in? Regardless, thanks a bunch for your continuous effort and kindness!
Kind regards,
Alvin
Hi Alvin,
Thanks for your support and kind words.
I would think that in this case you would multiple all your x1 data by .2, all your x2 data by .3 and all your x3 data by .5.
Charles
Hello Charles,
I just wanted to let you know that your tool has been a lifesaver! I’m in college right now taking an advanced lab class that requires me to analyze data using many of the functions provided by your RealStats tool. I really didn’t want to have to pick up python again to have to code in my own weighted least squares/linear regression, but your add-in is so easy to use and does it all perfectly.
Thanks!
Paige
Thank you very much Paige. Glad I could help.
Charles
Thank you for this website and the add-in module. I find your explanations and the formulas very useful.
I am trying to use the Weighted Linear Regression, but the WRegCoeff formula seems to have a limit of 65536 observations. Is that correct? Are you looking to increase the number of rows we can use in a weighted regression.
Thanks.
Hello Harry,
I am pleased that you are finding the Real Statistics website and add-in to be useful.
Yes, the current limit is 65,536 observations. Until now, no one has asked to support more than this number. What number of observations do you need?
Charles
Hi Charles,
A fantastic site, thank you. I was hoping you may ne able to point me in the direction of any journal articles/references that detail the best approach to calculate ‘Weights’
In particular, the rationale behind the following statement you make
We could use the reciprocals of the squared residuals from column W as our weights, but we obtain better results by first regressing the absolute values of the residuals on the Ad spend and using the predicted values instead of the values in column W to calculate the weights.
Many thanks,
Nick
Hi Nick,
Thanks for your very kind remarks.
Just google heteroskedasticity and weighted linear regression. This is a lot of information about this topic online.
Charles
Hello is there somebody who will count for me weighted linear regresion? Ill pay
Hello Simona,
What exactly do you need?
Charles
Hi
Could you introduce some references for weighted regression?
Hello Kamia,
Here are some references:
https://en.wikipedia.org/wiki/Weighted_least_squares
https://www.stat.cmu.edu/~cshalizi/350/lectures/18/lecture-18.pdf
https://www.itl.nist.gov/div898/handbook/pmd/section1/pmd143.htm
Many of the references regarding regression in the Bibliography also describe weighted regression. You can find more by just googling weighted regression.
Charles
Hi, Charles,
I tried to follow the example shown in your “Weighted Regression Basics” in Excel. I realised there is an user defined function “Design” in the spreadsheet. For example “=MMULT(MINVERSE(MMULT(TRANSPOSE(DESIGN(A7:A13)),C7:C13*
DESIGN(A7:A13))),MMULT(TRANSPOSE(DESIGN(A7:A13)),C7:C13*B7:B13))”. Could you please tell me how can I re-produce the example? Should I donwload some files from your Website? Would it be possible to show me the details how to do this? Many thanks, James Wang
DESIGN(A7:A13) is a Real Statistics formula that outputs a 7 x 2 array whose first column consists of all ones and whose second column consists of the elements in A7:A13. Thus if you, for example, place insert 1 in cell Q7, the formula =A7 in cell R7, highlight the range Q7:R13 and press Crtl-D, the range Q7:R13 can be used in the formula you listed in place of DESIGN(A7:A13).
In any case, you can access the DESGN function, as well as all the other functions and data analysis tools in the Real Statistics software by going to the following webpage for a free download:
https://real-statistics.com/free-download/real-statistics-resource-pack/
Charles
Hi Charles,
Could you clarify what the correct degrees of freedom is for Total, Regression and Residuals for Weighted Linear Regression?
The green table does not match the output from the Excel add-in.
I.e. is the Regression dof equal to the number of independent variables (i.e. 1) or equal to the number of independent variables plus one (i.e. 2)?
Thanks very much
Luke,
If your X data has k (independent) variables and n samples, then dfReg = k+1, dfTot = n and dfRes = dfTot – dfReg. Here the k+1 is used for dfReg since there is also data for the weights. See Figure 4.
Charles
Thanks Charles, much appreciated.
This, then, disagrees with the ‘green’ table towards the top of the page, where the dof is written as ‘k’.
Moreover, I am a little confused with formula for calculating the SS in weighted regression as provided in the add-in. This also does not seem to conform to the green table.
E.g. if I regression a data sample of y on x without weights, I arrive at different values for the squared sums than when I use the exact same dataset, but use the weighted regression and set all weights to 1.
Thanks for any help.
Best regards
Luke
Luke,
There isn’t just one approach to performing weighted regression. One of these approaches uses regression through the origin. I think this is the approach that I chose to use. I have just started looking into this topic again (using a textbook by Weisberg) and will eventually try to clarify these issues. I do believe, however, the following: (1) The value I use for R-Square is not correct and I need to fix it (2) to obtain another version of weighted regression you can simply use dfReg = k and dfTot = n-1; also I believe that SAS uses different values for SSReg and SSTot as well, although SSRes is the same.
Thanks for bringing this issue to my attention, so that I can delve into it further.
Charles
Hi,
Is there any Confidence/Predition Interval chart tool for WLSR?
/Per
Hi Per,
I have not yet included confidence/prediction support for weighted linear regression.
Charles
Hello Charles –
Thank you for the explanations. I have a question around cell G (std dev) and how to compute it for a simple data set. Suppose we have for a planned regression of Y on X the following points. X=(1,2,3,4,5) and Y=(1.1,2.5,3.4,3.8,7). How would I calculate the standard deviation for each data point (in trying to determine the weights for each data point) in order to transform the model?
Once I get this I understand how to arrive at the variance and hence the weights.
Thank you and regards
Abel,
If each “group” consists of one data point, then the standard deviation would be undefined (division by zero) and so you couldn’t calculate weights in this way.
Charles
Figure 4 confuses me. I would expect only Ln(mean) but not Weight. Weight is not the intercept, right?
Is there an option for robust std. errors with weighted linear regression?
Wytek,
Weight is one of the independent variables and not an intercept.
Currently there is no robust errors option for the Weighted Linear Regression data analysis tool. There is a robust errors option for the Multiple Regression data analysis tool.
Charles
I am still confused. Figure 3 shows that the independent variables (X) are D4:D12 while the Weights are H4:H12 and are required to be assigned to a separate input box.
I have imported the excel data into another tool (gretl) and the output indicates that the Weights is indeed the intercept.
Here are the results:
coefficient std. error t-ratio p-value
——————————————————–
const −100.846 53.2965 −1.892 0.1073
Lnmean 126.845 11.8291 10.72 3.88e-05 ***
Wytek,
The way that I implemented weighted regression I turned the problem into one with linear regression without intercept. I guess gretl implemented this in a different way. From what I understand, the results are the same.
Charles
Charles,
Without intercept gretl produces this:
coefficient std. error t-ratio p-value
——————————————————–
Lnmean 105.860 4.81325 21.99 1.01e-07 ***
You may want to add an intercept checkbox to the dialog box above to see the difference.
As you have correctly indicated above, the weights are a multiplier matrix W. Replace this W matrix with the identity matrix I and you get plain OLS.
Wytek,
Thanks for your suggestion. I’ll consider doing this at some point in the future.
Charles
When I run Weighted Linear regression I get R and Rsq values that exceed 9.5. When I run linear regression, I get R values of the order of .3 . I would expect similar R values since when I run weighted correlation coefficients and unweighted correlation coefficients there is a small difference.
BP,
If you send me an Excel file with your data and calculations, I can try to figure out what is happening.
Charles
I cannot get the “Weighted Linear Regression dialog box” when i press CTRL+m?
Goita,
Click on the Reg tab.
Charles
Hi Charles. I’m having this same problem. I click on CTRL+m and only get the beep from Excel.
Robert,
When you press Alt-TI, do you see RealStats and Solver on the list of addins with check marks next to them?
What do you see when you enter the formula =VER() in any cell?
Charles
Charles,
Yes, RealStats and Solver are present and ticked on the list of addins, and when I enter =VER() in any cell I get the #NAME? after pressing enter.
Robert
Robert,
Since =VER() is not working, Excel doesn’t recognize the RealStats software. Are you using Excel 2007? If so, check to make sure that you have stored the Real Statistics file in the correct place. See the instructions on the webpage from which you downloaded the Real Statistics file.
Charles
I thought it would be something like that, but I’m slightly confused. As both RealStats and Solver are on the Active Application Add-ins list. Also, the location I’ve used is the one stated on the webpage.
Forgot to mention, I’m using Excel 2013.
Robert,
This is strange. If you have tried all the troubleshooting steps, then all I can suggest is that you delete the RealStats file and start all over. Close Excel. Open Excel and remove RealStats from the list of addins (to do this press Alt-TI and uncheck RealStats. If you get a message to delete RealStats from the list of addins, agree. If not recheck RealStats and hopefully you will get this message.
Once you have completed these steps download the software from the website and repeat the installation