When performing multiple linear regression using the data in a sample of size n, we have n error terms, called residuals, defined by ei = yi – ŷi. One of the assumptions of linear regression is that there is no autocorrelation between the residuals, i.e. for all i ≠ j, cov(ei, ej) = 0.
Topics
- Introduction
- Detecting Autocorrelation
- Addressing Autocorrelation
- Real Statistics data analysis tools
- Cochrane-Orcutt Regression (includes Durbin-Watson test)
- Breusch-Godfrey and Newey-West
Dw:1.95 dl ile du kaç Alpha = .01
Sorry, but I don’t understand your question?
Charles
Charles,
I would like to bring your attention on the ANOVA table presented in Figure 4 under the section FGLS Method for Antocorrelation. Since we are using the error terms and one-lagged error terms, we end up loosing one observation count.
Therefore, the total observations count = 10. And the dfs for regression and total terms in the ANOVA should be 8 and 9 instead of 9 and 10, respectively. Because of these errors, the subsequent summary statistics are not accurate right now.
-Sun
Hello Sun,
The df are 9 and 10 since this is regression through the origin.
Charles
Actually the second regression is not through the origin, but the Prais-Winsten estimate is used and so we don’t lose the data for 2000.
Charles
Charles,
The output contents presented in Figure 5 under of the FGLS Method for Autocorrelation should be replaced with the multiple linear regression model output based on the generalized difference equation. Currently, the contents are exactly the same as the contents presented in Figure 4.
-Sun
Hello Sun,
Yes, the wrong image is given in Figure 5. I have just corrected this on the webpage.
Thank you very much for your help in identifying this and other mistakes on the website.
Charles
Charles,
I have a question on how the residual δs calculated in Figure 1 under the “FGLS Method using the Durbin-Watson coefficient” section.
In this example, you simply substracted e2 from e1 for δ1 (e1 – e2 = δ1). I wonder whether this should be changed to e1 – ρ*e2 = δ1 to be accurate to apply the GLS method.
Please advise.
-Sun
Hello Sun,
Yes, e1 – ρ*e2 = δ1. I believe that this is what was done, although the formula in cell N5 is =M5-M4*J$9. I have now corrected this on the webpage. Thanks for bringing this issue to my attention.
Charles
Charles,
There is a typo in the modified version of the Breusch-Godfrey test statistic.
For the first part of the statistic, the numerator and denominator should be swapped:
ie., it should be LM*={(n-p-k-1)/p}* R^2/(1-R^2).
Thanks,
-Sun
Hi Sun,
Yes, you are correct. I have now changed this on the webpage. As always, thanks for your help.
Charles
hi charles, my sample is 518
DW 2,014
DU 1,54
DL ?
4-DU 2,46
i have 1 dependent and 6 independent
Hi Fika,
If DU and DL represent the critical values, then for n = 518, k = 6 (# of independent variables) and alpha = .05, then DU = 1.879 and DL = 1.832. These values come from using the Real Statistics DUpperCRIT and DLowerCRIT functions.
Charles
Hi, Charles. May I get your help?
What is the difference between k and k’?
I got an example:
GDP=f (x1, x2, x3)
Thank you.
Hello Chiann,
There is no reference to k or k’ on the webpage. Which webpage are you referring to?
Charles
Hi Charles!
Thank you for all the nice explanations you have here in your website. They are all very helpful!
I have some issues regarding a detecting autocorrelation using the DW test. The sample size I am using is more than 5000 daily index returns. I have found out that the DW critical values are based only on sample sizes up to 2000. In the GRETL statistical software, when you test for autocorrelation it uses the Breusch-Godfrey test.
My question is since we don’t have the DW tablef for n>2000, can we test for autocorrelation using the B-G test?
I would appreciate it if you help me regarding this question.
Best regards,
Leonard
Leonard,
Yes, you can use the B-G test to detect autocorrelation. See
https://real-statistics.com/multiple-regression/autocorrelation/breusch-godfrey-test/
Charles
I have a sample N=438 with 2 dependent variable and I have seen the table of dw as your recommendation. But, all the sample in that table contain of 430,440, 450 and so on. How I can interpret the data?
Rauzatul Jannah,
You need to interpolate the value between N=430 and N=440. If you use the DLowerCRIT and DUpperCRIT functions this interpolation will be done automatically for you. See the following webpage re interpolation:
Interpolation
Charles
Hi Charles,
i have one dependent and 4 independent variables. all are observed as time series (daily stock prices). the n= 3857 days. kindly help me with critical values of DWlower and DWupper.
thanks in advance
Deepak
Deepak,
The DLowerCRIT and DUpperCRIT Real Statistics functions handle values of n much larger than those that appear in the table — e.g. =DLowerCRIT(2000,5,.05), but they don’t handle n = 3857. You may be able to extrapolate to this value, but I don’t know how accurate the result will be.
Charles
I have serial correlation problem. When I add ar(1) ar(2) and ar(3) the DW statistics become near 2. Is there a problem about adding ar() s to remove autocorellation? Best,
Orthun,
I don’t understand what you mean by adding ar(1), etc. Can you give me an example of how you did this adding?
Charles
model is acceptable or not when durbin watson value is 1.4555. if yes plz give some reference
Ashish,
As described on the referenced webpage, it depends on the values alpha, n and k. Also see
https://real-statistics.com/statistics-tables/durbin-watson-table/
Charles
Hello Dr. Zaiontz,
I have a question.
What are the null and alternative hypotheses for the presence of autocorrelation? Are they the same as you’ve shown above? Can you also tell me the reasons why they are the null and alternative hypotheses for autocorrelation?
Thank you
Hyeongsin
Hyeongsin,
I believe that what is shown on the webpage are the correct null and alternative hypotheses. Do you have a different opinion?
Charles
Thank you for ur reply.
I have a question that what are the reasons that they are the null and alternative autocorrelation. I want to know why these two (null hypothesis H0: the autocorrelation ρ ≤ 0 and the alternative hypothesis H1: ρ > 0) are the hypotheses of autocorrelation.
Thank you
Hyeongsin
Hyeongsin,
Admittedly, I have not inked the null and alternative hypotheses to the test using d-L and d-H. To keep things simple, I would just use the test and not look at the exact statements of the null and alternative hypotheses, which are not used anyway.
Charles
Hello Dr. Zaiontz,
this is my problem: I have performed some experimental tests (tomographies) on 40 samples. These samples are made by four different manufacturers (each manufacturer sent me 10 samples). I have performed an ANOVA test to check if the manufacturer is a significant factor. Then I have checked the normality assumption of the standardized residuals (SRES) with a Anderson-Darling’s test. Now I want to check if there is some dependency among the SRES. To do so I usually use the autocorrelation function in Minitab (Stat – Time Series – Autocorrelation). But now the problem is that I have performed the experimental tests with a set of 4 samples (so 10 experimental tests in total). How can I check the dependency?
Best regards
Davide
Davide,
Are you worried about experimentwise error since you are performing multiple test (or am I missing the point of your question)?
Charles
Hello Dr. Zaiontz,
Question about stationary time series using a school population example.
Five years of annual kindergarten class size is a non-stationary time series.
Five years of the ratio of kindergarten class size to the entire school’s population each year – is that considered a stationary time series?
Thanks very much.
David
David,
I would need to see the data to answer your question.
Charles
Thanks for your reply Dr. Zaiontz.
I know my data pretty well. I think one variable is at least causing the behavior of the other. No one believes this.
I chose the word “interaction” because I wondered if I’m actually observing a sort of bi-variate push me – pull me scenario. Think low d statistic, very high correlation, linear relationship.
Let me know what you think and thank you again.
David
David,
I have nothing else to add. Perhaps you are correct, but I don’t see the evidence.
Charles
Hello Dr. Zaiontz
Thank you for your informative web site.
I have two monthly time series that I have updated continuously since early 2011.
The r-squared of the two series > .98
The scatterplot looks fairly evenly distributed about the OLS line.
The Durbin Watson statistic hovers about 1.03 for sample sizes of 49 or 66, depending on how many periods are in the regression.
My simple question is this – could this indication of autocorrelation actually be indicating interaction between these variables?
Thank you again.
Dave
Dave,
What do you mean by “interaction between the variables”?
Given the high correlation between the variables, if one demonstrates autocorrelation, I am not surprised if the other does too.
Charles
Dear Charles,
I have three time series of Soil water data from three different Hillslopes. I am checking whether there are any significant differences among the means of soil water data collected from these three hillslopes (the hillslopes are the treatments here). They were measured in two-day interval for a one-year period. I checked those methods to see if there is any serial correlation within each group of observation. According to DW test, and plotting the residuals and also PACF showed that there is serial correlation. Serial correlation of lag one was to be around 0.8-0.9 for each data set.
Before going to the next step and running ANOVA test to check the variances and means of the three data sets, I need to remove the serial correlation. The AR (1) model which normally used to account for serial correlation in regression analysis did not work in my case. I also tried the sub-sampling technique which did not work either.
I hope you can help me with a simple method to account for serial coorelation when running ANOVA and comparing the means for the three treatments.
Thank you so much,
Elyas
Elyas,
This seems to be some sort of repeated measures ANOVA. Correlation between time elements is common. You need to account for sphericity in the ANOVA model or use MANOVA.
Charles
Charles,
If my DW-stat = 1.977, my k’=3 and n=10, what would be the conclusion and how you would draw the DW chart?
Thanks,
Luca
5% level of significance as well. I am sorry
Luca,
Based on the DW Table, dL = .525 and dU = 2.016. Since d = 1.977 is between these two values, the test is inconclusive (i.e. no conclusion can be made one way or the other).
Charles
I want to know if the terms autocorrelation and durbin-watson are statistics that apply exclusively when time is involved, that is to time series analysis? If there is no time involved, can you still involve autocorrelation and the durbin watson statistics? Please elucidate me on this issue.
Hector,
Autocorrection (and Durbin-Watson) can occur with non-time series data. This issue is more likely to occur with time series data.
Charles
Thanks Charles, so kind of you to help people in questions about statistics. If you would permit me, I am going expand a little bit more on my research. You see there are established tables of atmospheric standards as density vs temperature used in aeronautics. It happens that I came across with the table that gives the density as function of temperature and when you apply experimental design, the residual graphics (the residual fits and orders) are highly correlated, and the D-W statistics value was of .25, even though the R2, s, PRESS, etc. seemed to be OK. So what I did, was to manually correct the variation using the graph of the ln density vs. temperature and positioning the points manually right in the regression line. By running again the regression model, all the objective and subjective responses improved dramatically. If my logic is right, the use of the standard atmospheric values for aeronautical purposes would be questionable.
Thank you very much for your valuable attention to my questions and comments.
Best regards
Hector A. Quevedo (Ph.D.)
what can we do if autocorrelation exist, teacher?
Syikin,
I will adding some information about what to do when there is autocorrelation shortly.
Charles
Hi Charles…
Thank you for the greetings and also for support.
I had already noticed this option on the dialog box of the multiple regression rotine, but when I performed this routine I did not recognize the p-value on outputs. I need of the p-value (numerical value between 0 – 1) associated Durbin Watson statistic. Is it possible to calculate the p-value by some procedure/function for I to compare with significance level (5%)? Congratulations for your Add-in !!!
Hi Charles,
I am from Brasil and I’m really liking the Add-in that you have programmed. Is there any function that is possible to calculate the p-value of stastistic of durbin Watson? If No, is possible development?
I’m would like apologize by my words.
best regards,
Weidson
Weidson,
Greetings to you in Brasil. To access the Durbin-Watson data analysis tool, choose Regression from the main menu. Then choose Multiple Linear Regression on the dialog box that appears. Finally choose the Durbin-Watson option on the subsequent dialog box.
Charles
Hello dr. Charles
Could you help me with my doubt bellow?
Thanks vary much.
Weidson
Weidson,
I currently only test for first order autocorrelation using a table of critical values for Durbin-Watson’s d statistic. Sorry, but I don’t yet calculate a p-value.
For large samples (of size n), you can use the p-value of the normal distribution based on the fact that Durbin-Watson’s d is approximately normally distributed with mean 2 and variance 4/n.
Charles
Dr. Charles
Thanks vary much by your fast response.
Weidson.
Charles:
Are DURBIN formula and LEVERAGE formula in Real Statistics sensible to the number of observations or the number of independent variables?
I suppose that because when I use Real Statistics Data Analysis Tool 4.1 (Multiple Regression option) for 34 observations, 4 independent variables and 1 dependent variable, I obtain correct results. But for another case, 15157 observations, 49 independent variables and 1 dependent variable I obtain two errors:
1. #VALUE! for all the data in Durbin-Watson Table .
2. #VALUE! for all the data in Leverage column (Cook’s D Table), and all the data in the columns related to Leverage (Mod MSE, RStudent, T-Test, Cook’s D, DFFITS).
3. The rest of the results seems to be OK.
Is it related to the matrix operations been executed by Excel when matrices are so big? Or is it possible that my data is forming bad-conditioned matrices? If so, ¿Why errors are present only in results related to DURBIN and LEVERAGE formulas, but not in the rest of the regression tables?
Thank you.
William Agurto.
Charles:
It seems that my data is OK (15157 observations, 49 independent variables and 1 dependent variable): when I used Factor Analysis in Real Statistics Data Analysis Tool I have not gotten errors. So, the problema is only present in DURBIN and LEVERAGE formulas.
Thank you.
William Agurto.
William,
Can you send me and Excel file with your data so that I can try to figure out what the problem is?
Charles
Charles:
I sent the Excel file with the data to your both e-mail accounts:
czaiontz@gmail.com
info@real-statistics.com
I hope you can find the problem. It seems like my data is OK.
Please, confirm when you receive the file (more than 10 Mb).
I will be waiting for your answer.
Thank you.
William Agurto.
William,
I have received your email.
Charles
Charles:
I received your e-mail. I answered that today: the data that I sent you has no missing values. The character “-” is a zero value. It is seen because of the Excel format for the variable X5 (I used that format because of the magnitud of that variable). If you change the format of that column (to “general” format, for example) you can see that “-” is really a zero value.
Please, review the data again to verify the functionality of DURBIN and LEVERAGE formulas.
Thank you.
William Agurto.
William,
Yes despite seeing a “-” I see that the value is a zero.
In any case, the problem with DURBIN is that the values for n and k exceed the size of the values in the Durbin-Watson table. I am going to explore using a normal approximation in this case.
The problem with LEVERAGE is that the number of data items exceeds 2178. In this case when the hat matrix is evaluated it looks like the size becomes too large. For the next release I plan to perform the calculation in a different way to avoid this problem.
Thanks for finding these problems. Stay tuned.
Charles
how to calculate power of durbin watson test?
is there any mathematical structure of power for durbin watson test?
I have not tried to calculate the power of the Durbin-Watson test, but I did come across an article which may be useful to you
http://publicationslist.org/data/abcjar/ref-41/ABCJAR%206.7.pdf
Charles
Hi,
I am trying to run a regression analysis in which i have 50 time periods, one dep variable and 4 independent variables. I havr conducted data transformations. The minimum I got is dw 1.0. Is this acceptable? My objective is not to forecast but to find the contributing variable.
If n = 50, k = 4 and alpha = .05, then the DW bounds are 1.206 and 1.537. I am not sure what you mean by the minimum value of DW, but if DW is 1.0, then since 1.0 < 1.206 you need to reject the null hypothesis that rho <= 0. Charles
Dear Charles ,
I wanna know what is Lower critical value dL and Upper critical value dU.
I’ve been found it, but I can’t find it still.
Please let me know its equation.
*I’m not good at English. If I had wrong grammer, please understand me, Thankyou 🙂
Thanks.
Tarra
Tarra,
Perhaps the following articles will be helpful
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2279559
http://wps.aw.com/wps/media/objects/2387/2445250/PPTs/c11lectr15.ppt
Charles
Hi Charles,
Mentioned K as # of dependent variables and after the example K as # of Independent variables. Could you please check and clarify where K used in this test.
Thanks
Anusha
Hi Anusha,
k = # of independent variables. Thanks for identifying this typo. I have corrected the referenced webpage.
k is used twice in the test. It is used to calculate the predicted y values and it is used in the Durbin-Watson table of critcial values.
Charles
Thank you very much for your presentation.
Nevertheless, I need further information about the Dl identifcation.
Reporting to the Durbin Watson table (for alpha 0.05, n=11, k=2) the tests value are .0.519 / 1.297.
However, the calculated test shows the 0.75798 value.
Where did I have missed something ?
Thank you for your reply
JeanMarc,
Since the test statistic is between the two critical values, the test is inconclusive.
I am not sure I know what you mean by “Dl identification”. Is this simply the results of the Durbin Watson test?
Charles
So, if you are able to generate the Durbin-Watson stat, how do you use Excel to generate a significance for that D stat from a given α, n and k? Aside from looking it up in tables someone else figured out already…
Kai,
Significance testing for Durbin-Watson is not included in the latest release of the Real Statistics software. I plan to add this shortly, probably in the next release.
Charles
Kai,
I have now included the Durbin-Watson table on the website. In the next release of the software (due out in the next few days) I will provide a function that gives the critical values for sample sizes up to 5,000 elements and up to 20 independent variables. I will also provide a function that carries out the significance test.
Charles
Thanks, I’m looking forward to seeing it!
Significance testing of the Durbin-Watson stat seems to be missing many programs/add-Ins…your offerings will be uncommonly complete there!
I’d like to fill in some gaps in something I’m working on in Python too, so I’m curious – what is the math formula you’re using find the cummulative significance values for D?
Kai,
Significance testing for Durbin-Watson is available in Release 3.3 or Rel 3.3.1 of the software. The testing is done using a table lookup.
Charles
Luan,
Your teacher is correct. While I can think of situations where serial correlation could be used for non-time-series data, in practice it is used with time-series data.
Charles
Thank you very much Mr. Charles, your reply help me so much. Now I am clear on this point 🙂
Luan
Dear Mr. Charles,
My teacher said that Durbin Watson test can only be used for time series data, not for cross sectional data.
When I read this post I very wonder if my teacher was seriously wrong. Can you help me explain this more details ?
Best Regards,
Luan