ARIMA Differencing

Basic Concepts

In order to create a stationary process, differencing may be necessary. For example, the graph of closing Dow Jones indices for October 2015 (see Example 1 of Stationary Process), as shown in Figure 1, clearly shows an increasing trend. Differencing is a way to eliminate such trends.

Dow-Jones diferencing

Figure 1 – Dow Jones Indices for October 2015

First-order differencing addresses linear trends, and employs the transformation zi = yi – yi-1. Second-order differencing addresses quadratic trends and employs a first-order difference of a first-order difference, namely zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – 2yi-1+ yi-2.

Taking first-order differences for the data in Figure 1 results in the chart on the right. The trend seems to have been eliminated.

ARIMA

An autoregressive integrated moving average (ARIMA) process (aka a Box-Jenkins process) adds differencing to an ARMA process. An ARMA(p,q) process with d-order differencing is called an ARIMA(p.d,q) process. Thus, for example, an ARIMA(2,1,0) process is an AR(2) process with first-order differencing.

It is important not to over-difference since this can cause you to use an incorrect model. Some rules-of-thumb indicating that you may have differenced too many times are:

  • The autocorrelation of a differenced series is less than -.5
  • Differencing increases the variance

Rules-of-thumb

An AR(p) or MA(q) process has a unit root if the sum of the non-constant coefficients is 1.

Additional rules-of-thumb:

  • If an AR(p) process has a unit root then the level of differencing should be increased
  • If an MA(q) process has a unit root then the level of differencing should be decreased

Assume that column A contains a time series of size n starting in cell A1. Now suppose we want to place the 1st order differences (of size n-1) in column B starting in cell B2, the 2nd order differences (of size n-2) in column C starting in cell C3, and so on with the 7th order differences (of size n-7) in column H starting in cell H8, then the formulas that need to be used in these starting cells are as follows:

  • B2: A2-A1
  • C3: A3-2*A2+A1
  • D4: A4-3*A3+3*A2-A1
  • E5: A5-4*A4+6*A3-4*A2+A1
  • F6: A6-5*A5+10*A4-10*A3+5*A2-A1
  • G7: A7-6*A6+15*A5-20*A4+15*A3-6*A2+A1
  • H8: A8-7*A7+21*A6-35*A5+35*A4-21*A3+7*A2-A1

For each column, you need to highlight the range down to the nth cell in that column and press Ctrl-D to get the other values. Note that for the 7th order difference, the coefficients used are C(7,0) = 1, C(7,1) = 7, C(7,2) = 21, etc.

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following array function.

ADIFF(R1, d) – takes the time series in the n × 1 range R1 and outputs an n–d × 1 range containing the data in R1 differenced d times

Example 1: Find the 1st, 2nd, 3rd, and 4th differences for the data in column A of Figure 1.

Differencing ARIMA

Figure 1 – Differencing

Here cell B4 contains the formula =A5-A4, cell C4 contains the formula =B5-B4 (or A6-2*A5+A4), cell D4 contains the formula =C5-C4 (or A7-3*A6+3*A5-A4) and cell E4 contains the formula =D5-D4 (or A8-4*A7+6*A6-4*A5+A4).

Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). If we highlight the range G4:J22 and press Ctrl-R, we get the result shown on the right side of Figure 1.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Nau, R. (2020) Identifying the order of differencing in an ARIMA model
https://people.duke.edu/~rnau/411arim2.htm

12 thoughts on “ARIMA Differencing”

  1. Hello.

    Thank you for this amazing explanation, it has really helped me. I wanted to find out on how one can conduct a test for stationarity using an equation such as this one: Yt = -0.48Yt-1 + Ut + 0.72Ut-1.

    Thank you.

    Reply
  2. ” Thus, for example, an ARIMA(2,0,1) process is an AR(2) process with first-order differencing.”

    I thought an ARIMA(2,0,1) process was an AR(2) and MA(1) process, and 0 order/degree of differencing is needed for the series to be stationary.

    If you wrote “an AR(2) process with first-order differencing.” , then isn’t it just ARIMA(2,1,0) ???

    Sorry I am a bit confused, please help. Thank you

    Reply
    • Hello Eason,
      Yes, you are correct. I used ARIMA(p,q,d) instead of ARIMA(p,d,q). I have now changed this on this webpage and a few other webpages. Thanks for catching this mistake.
      Charles

      Reply
  3. Hi,

    In your statement above, you mentioned:
    “zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – yi-2”

    Shouldn’t it be:
    zi = yi – 2yi-1 + yi-2?

    Kindly correct me if I’m wrong.

    Thanks,
    Jiawei

    Reply
  4. When i checked in excel for Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). It is retriving all values as 2.690575.

    Values are not coming as below:
    2.690575
    -0.901482
    1.212705
    2.015852
    1.209541
    -0.197216
    -1.046158
    2.044927
    0.89878
    -0.92262
    0.05102
    0.64771
    -0.73725
    0.26199
    -0.653847
    1.513367
    1.44655
    0.75421
    -1.17511

    Reply
  5. Thanks for your clear explanation. It is really helpful to me as a student desiring to fully understand the way of implementing the ARIMA model using toy data, without exploiting other commercial solvers such as STATA, R and so on. Because I wasn’t sure whether I properly formulated ARIMA models using obtained coefficients or not, thus, I looked forward to observing the detailed steps for forecasting something as presented in your website. Thank you !

    Reply
  6. Hi Charles, thanks for your tools and explain, I’m looking long for it. Some questions for differencing:
    Your example is the N-order difference for trend data, if there’s a data set with trend and seasonal, I think may need to use K-step difference as well, but how can I identify the difference by using your ARIMA model ?

    Reply
    • Chenny,
      The webpage describes n-step differencing for ARIMA, but I have not yet included seasonality. I have described seasonality for linear regression and for Holt-Winter.
      Charles

      Reply

Leave a Comment