Autocorrelation

When performing multiple linear regression using the data in a sample of size n, we have n error terms, called residuals, defined by ei = yi – ŷi. One of the assumptions of linear regression is that there is no autocorrelation between the residuals, i.e. for all i ≠ j, cov(ei, ej) = 0.

Topics

80 thoughts on “Autocorrelation”

  1. Charles,
    I would like to bring your attention on the ANOVA table presented in Figure 4 under the section FGLS Method for Antocorrelation. Since we are using the error terms and one-lagged error terms, we end up loosing one observation count.

    Therefore, the total observations count = 10. And the dfs for regression and total terms in the ANOVA should be 8 and 9 instead of 9 and 10, respectively. Because of these errors, the subsequent summary statistics are not accurate right now.

    -Sun

    Reply
  2. Charles,
    The output contents presented in Figure 5 under of the FGLS Method for Autocorrelation should be replaced with the multiple linear regression model output based on the generalized difference equation. Currently, the contents are exactly the same as the contents presented in Figure 4.

    -Sun

    Reply
    • Hello Sun,
      Yes, the wrong image is given in Figure 5. I have just corrected this on the webpage.
      Thank you very much for your help in identifying this and other mistakes on the website.
      Charles

      Reply
  3. Charles,
    I have a question on how the residual δs calculated in Figure 1 under the “FGLS Method using the Durbin-Watson coefficient” section.

    In this example, you simply substracted e2 from e1 for δ1 (e1 – e2 = δ1). I wonder whether this should be changed to e1 – ρ*e2 = δ1 to be accurate to apply the GLS method.

    Please advise.
    -Sun

    Reply
    • Hello Sun,
      Yes, e1 – ρ*e2 = δ1. I believe that this is what was done, although the formula in cell N5 is =M5-M4*J$9. I have now corrected this on the webpage. Thanks for bringing this issue to my attention.
      Charles

      Reply
  4. Charles,
    There is a typo in the modified version of the Breusch-Godfrey test statistic.

    For the first part of the statistic, the numerator and denominator should be swapped:
    ie., it should be LM*={(n-p-k-1)/p}* R^2/(1-R^2).

    Thanks,
    -Sun

    Reply
    • Hi Fika,
      If DU and DL represent the critical values, then for n = 518, k = 6 (# of independent variables) and alpha = .05, then DU = 1.879 and DL = 1.832. These values come from using the Real Statistics DUpperCRIT and DLowerCRIT functions.
      Charles

      Reply
  5. Hi, Charles. May I get your help?
    What is the difference between k and k’?
    I got an example:
    GDP=f (x1, x2, x3)

    Thank you.

    Reply
  6. Hi Charles!

    Thank you for all the nice explanations you have here in your website. They are all very helpful!

    I have some issues regarding a detecting autocorrelation using the DW test. The sample size I am using is more than 5000 daily index returns. I have found out that the DW critical values are based only on sample sizes up to 2000. In the GRETL statistical software, when you test for autocorrelation it uses the Breusch-Godfrey test.

    My question is since we don’t have the DW tablef for n>2000, can we test for autocorrelation using the B-G test?

    I would appreciate it if you help me regarding this question.

    Best regards,

    Leonard

    Reply
  7. I have a sample N=438 with 2 dependent variable and I have seen the table of dw as your recommendation. But, all the sample in that table contain of 430,440, 450 and so on. How I can interpret the data?

    Reply
    • Rauzatul Jannah,
      You need to interpolate the value between N=430 and N=440. If you use the DLowerCRIT and DUpperCRIT functions this interpolation will be done automatically for you. See the following webpage re interpolation:
      Interpolation
      Charles

      Reply
  8. Hi Charles,
    i have one dependent and 4 independent variables. all are observed as time series (daily stock prices). the n= 3857 days. kindly help me with critical values of DWlower and DWupper.
    thanks in advance
    Deepak

    Reply
    • Deepak,
      The DLowerCRIT and DUpperCRIT Real Statistics functions handle values of n much larger than those that appear in the table — e.g. =DLowerCRIT(2000,5,.05), but they don’t handle n = 3857. You may be able to extrapolate to this value, but I don’t know how accurate the result will be.
      Charles

      Reply
  9. I have serial correlation problem. When I add ar(1) ar(2) and ar(3) the DW statistics become near 2. Is there a problem about adding ar() s to remove autocorellation? Best,

    Reply
  10. Hello Dr. Zaiontz,

    I have a question.
    What are the null and alternative hypotheses for the presence of autocorrelation? Are they the same as you’ve shown above? Can you also tell me the reasons why they are the null and alternative hypotheses for autocorrelation?

    Thank you
    Hyeongsin

    Reply
    • Hyeongsin,
      I believe that what is shown on the webpage are the correct null and alternative hypotheses. Do you have a different opinion?
      Charles

      Reply
      • Thank you for ur reply.

        I have a question that what are the reasons that they are the null and alternative autocorrelation. I want to know why these two (null hypothesis H0: the autocorrelation ρ ≤ 0 and the alternative hypothesis H1: ρ > 0) are the hypotheses of autocorrelation.

        Thank you
        Hyeongsin

        Reply
        • Hyeongsin,
          Admittedly, I have not inked the null and alternative hypotheses to the test using d-L and d-H. To keep things simple, I would just use the test and not look at the exact statements of the null and alternative hypotheses, which are not used anyway.
          Charles

          Reply
  11. Hello Dr. Zaiontz,
    this is my problem: I have performed some experimental tests (tomographies) on 40 samples. These samples are made by four different manufacturers (each manufacturer sent me 10 samples). I have performed an ANOVA test to check if the manufacturer is a significant factor. Then I have checked the normality assumption of the standardized residuals (SRES) with a Anderson-Darling’s test. Now I want to check if there is some dependency among the SRES. To do so I usually use the autocorrelation function in Minitab (Stat – Time Series – Autocorrelation). But now the problem is that I have performed the experimental tests with a set of 4 samples (so 10 experimental tests in total). How can I check the dependency?
    Best regards
    Davide

    Reply
    • Davide,
      Are you worried about experimentwise error since you are performing multiple test (or am I missing the point of your question)?
      Charles

      Reply
  12. Hello Dr. Zaiontz,
    Question about stationary time series using a school population example.
    Five years of annual kindergarten class size is a non-stationary time series.
    Five years of the ratio of kindergarten class size to the entire school’s population each year – is that considered a stationary time series?
    Thanks very much.
    David

    Reply
  13. Thanks for your reply Dr. Zaiontz.
    I know my data pretty well. I think one variable is at least causing the behavior of the other. No one believes this.
    I chose the word “interaction” because I wondered if I’m actually observing a sort of bi-variate push me – pull me scenario. Think low d statistic, very high correlation, linear relationship.
    Let me know what you think and thank you again.
    David

    Reply
  14. Hello Dr. Zaiontz
    Thank you for your informative web site.
    I have two monthly time series that I have updated continuously since early 2011.
    The r-squared of the two series > .98
    The scatterplot looks fairly evenly distributed about the OLS line.
    The Durbin Watson statistic hovers about 1.03 for sample sizes of 49 or 66, depending on how many periods are in the regression.
    My simple question is this – could this indication of autocorrelation actually be indicating interaction between these variables?
    Thank you again.
    Dave

    Reply
    • Dave,
      What do you mean by “interaction between the variables”?
      Given the high correlation between the variables, if one demonstrates autocorrelation, I am not surprised if the other does too.
      Charles

      Reply
  15. Dear Charles,
    I have three time series of Soil water data from three different Hillslopes. I am checking whether there are any significant differences among the means of soil water data collected from these three hillslopes (the hillslopes are the treatments here). They were measured in two-day interval for a one-year period. I checked those methods to see if there is any serial correlation within each group of observation. According to DW test, and plotting the residuals and also PACF showed that there is serial correlation. Serial correlation of lag one was to be around 0.8-0.9 for each data set.
    Before going to the next step and running ANOVA test to check the variances and means of the three data sets, I need to remove the serial correlation. The AR (1) model which normally used to account for serial correlation in regression analysis did not work in my case. I also tried the sub-sampling technique which did not work either.
    I hope you can help me with a simple method to account for serial coorelation when running ANOVA and comparing the means for the three treatments.

    Thank you so much,
    Elyas

    Reply
    • Elyas,
      This seems to be some sort of repeated measures ANOVA. Correlation between time elements is common. You need to account for sphericity in the ANOVA model or use MANOVA.
      Charles

      Reply
  16. Charles,

    If my DW-stat = 1.977, my k’=3 and n=10, what would be the conclusion and how you would draw the DW chart?

    Thanks,

    Luca

    Reply
  17. I want to know if the terms autocorrelation and durbin-watson are statistics that apply exclusively when time is involved, that is to time series analysis? If there is no time involved, can you still involve autocorrelation and the durbin watson statistics? Please elucidate me on this issue.

    Reply
    • Hector,
      Autocorrection (and Durbin-Watson) can occur with non-time series data. This issue is more likely to occur with time series data.
      Charles

      Reply
      • Thanks Charles, so kind of you to help people in questions about statistics. If you would permit me, I am going expand a little bit more on my research. You see there are established tables of atmospheric standards as density vs temperature used in aeronautics. It happens that I came across with the table that gives the density as function of temperature and when you apply experimental design, the residual graphics (the residual fits and orders) are highly correlated, and the D-W statistics value was of .25, even though the R2, s, PRESS, etc. seemed to be OK. So what I did, was to manually correct the variation using the graph of the ln density vs. temperature and positioning the points manually right in the regression line. By running again the regression model, all the objective and subjective responses improved dramatically. If my logic is right, the use of the standard atmospheric values for aeronautical purposes would be questionable.
        Thank you very much for your valuable attention to my questions and comments.
        Best regards
        Hector A. Quevedo (Ph.D.)

        Reply
  18. Hi Charles…

    Thank you for the greetings and also for support.

    I had already noticed this option on the dialog box of the multiple regression rotine, but when I performed this routine I did not recognize the p-value on outputs. I need of the p-value (numerical value between 0 – 1) associated Durbin Watson statistic. Is it possible to calculate the p-value by some procedure/function for I to compare with significance level (5%)? Congratulations for your Add-in !!!

    Reply
  19. Hi Charles,

    I am from Brasil and I’m really liking the Add-in that you have programmed. Is there any function that is possible to calculate the p-value of stastistic of durbin Watson? If No, is possible development?

    I’m would like apologize by my words.

    best regards,

    Weidson

    Reply
    • Weidson,
      Greetings to you in Brasil. To access the Durbin-Watson data analysis tool, choose Regression from the main menu. Then choose Multiple Linear Regression on the dialog box that appears. Finally choose the Durbin-Watson option on the subsequent dialog box.
      Charles

      Reply
        • Weidson,

          I currently only test for first order autocorrelation using a table of critical values for Durbin-Watson’s d statistic. Sorry, but I don’t yet calculate a p-value.

          For large samples (of size n), you can use the p-value of the normal distribution based on the fact that Durbin-Watson’s d is approximately normally distributed with mean 2 and variance 4/n.

          Charles

          Reply
  20. Charles:

    Are DURBIN formula and LEVERAGE formula in Real Statistics sensible to the number of observations or the number of independent variables?
    I suppose that because when I use Real Statistics Data Analysis Tool 4.1 (Multiple Regression option) for 34 observations, 4 independent variables and 1 dependent variable, I obtain correct results. But for another case, 15157 observations, 49 independent variables and 1 dependent variable I obtain two errors:
    1. #VALUE! for all the data in Durbin-Watson Table .
    2. #VALUE! for all the data in Leverage column (Cook’s D Table), and all the data in the columns related to Leverage (Mod MSE, RStudent, T-Test, Cook’s D, DFFITS).
    3. The rest of the results seems to be OK.
    Is it related to the matrix operations been executed by Excel when matrices are so big? Or is it possible that my data is forming bad-conditioned matrices? If so, ¿Why errors are present only in results related to DURBIN and LEVERAGE formulas, but not in the rest of the regression tables?

    Thank you.

    William Agurto.

    Reply
    • Charles:

      It seems that my data is OK (15157 observations, 49 independent variables and 1 dependent variable): when I used Factor Analysis in Real Statistics Data Analysis Tool I have not gotten errors. So, the problema is only present in DURBIN and LEVERAGE formulas.

      Thank you.

      William Agurto.

      Reply
          • Charles:

            I received your e-mail. I answered that today: the data that I sent you has no missing values. The character “-” is a zero value. It is seen because of the Excel format for the variable X5 (I used that format because of the magnitud of that variable). If you change the format of that column (to “general” format, for example) you can see that “-” is really a zero value.
            Please, review the data again to verify the functionality of DURBIN and LEVERAGE formulas.

            Thank you.

            William Agurto.

          • William,

            Yes despite seeing a “-” I see that the value is a zero.

            In any case, the problem with DURBIN is that the values for n and k exceed the size of the values in the Durbin-Watson table. I am going to explore using a normal approximation in this case.

            The problem with LEVERAGE is that the number of data items exceeds 2178. In this case when the hat matrix is evaluated it looks like the size becomes too large. For the next release I plan to perform the calculation in a different way to avoid this problem.

            Thanks for finding these problems. Stay tuned.

            Charles

  21. how to calculate power of durbin watson test?
    is there any mathematical structure of power for durbin watson test?

    Reply
  22. Hi,

    I am trying to run a regression analysis in which i have 50 time periods, one dep variable and 4 independent variables. I havr conducted data transformations. The minimum I got is dw 1.0. Is this acceptable? My objective is not to forecast but to find the contributing variable.

    Reply
    • If n = 50, k = 4 and alpha = .05, then the DW bounds are 1.206 and 1.537. I am not sure what you mean by the minimum value of DW, but if DW is 1.0, then since 1.0 < 1.206 you need to reject the null hypothesis that rho <= 0. Charles

      Reply
  23. Dear Charles ,

    I wanna know what is Lower critical value dL and Upper critical value dU.
    I’ve been found it, but I can’t find it still.
    Please let me know its equation.

    *I’m not good at English. If I had wrong grammer, please understand me, Thankyou 🙂

    Thanks.
    Tarra

    Reply
  24. Hi Charles,

    Mentioned K as # of dependent variables and after the example K as # of Independent variables. Could you please check and clarify where K used in this test.

    Thanks
    Anusha

    Reply
    • Hi Anusha,
      k = # of independent variables. Thanks for identifying this typo. I have corrected the referenced webpage.
      k is used twice in the test. It is used to calculate the predicted y values and it is used in the Durbin-Watson table of critcial values.
      Charles

      Reply
  25. Thank you very much for your presentation.

    Nevertheless, I need further information about the Dl identifcation.

    Reporting to the Durbin Watson table (for alpha 0.05, n=11, k=2) the tests value are .0.519 / 1.297.

    However, the calculated test shows the 0.75798 value.

    Where did I have missed something ?

    Thank you for your reply

    Reply
    • JeanMarc,
      Since the test statistic is between the two critical values, the test is inconclusive.
      I am not sure I know what you mean by “Dl identification”. Is this simply the results of the Durbin Watson test?
      Charles

      Reply
  26. So, if you are able to generate the Durbin-Watson stat, how do you use Excel to generate a significance for that D stat from a given α, n and k? Aside from looking it up in tables someone else figured out already…

    Reply
    • Kai,
      Significance testing for Durbin-Watson is not included in the latest release of the Real Statistics software. I plan to add this shortly, probably in the next release.
      Charles

      Reply
    • Kai,
      I have now included the Durbin-Watson table on the website. In the next release of the software (due out in the next few days) I will provide a function that gives the critical values for sample sizes up to 5,000 elements and up to 20 independent variables. I will also provide a function that carries out the significance test.
      Charles

      Reply
      • Thanks, I’m looking forward to seeing it!

        Significance testing of the Durbin-Watson stat seems to be missing many programs/add-Ins…your offerings will be uncommonly complete there!

        I’d like to fill in some gaps in something I’m working on in Python too, so I’m curious – what is the math formula you’re using find the cummulative significance values for D?

        Reply
        • Kai,
          Significance testing for Durbin-Watson is available in Release 3.3 or Rel 3.3.1 of the software. The testing is done using a table lookup.
          Charles

          Reply
  27. Luan,
    Your teacher is correct. While I can think of situations where serial correlation could be used for non-time-series data, in practice it is used with time-series data.
    Charles

    Reply
  28. Dear Mr. Charles,

    My teacher said that Durbin Watson test can only be used for time series data, not for cross sectional data.
    When I read this post I very wonder if my teacher was seriously wrong. Can you help me explain this more details ?

    Best Regards,
    Luan

    Reply

Leave a Comment