Basic Concepts
In order to create a stationary process, differencing may be necessary. For example, the graph of closing Dow Jones indices for October 2015 (see Example 1 of Stationary Process), as shown in Figure 1, clearly shows an increasing trend. Differencing is a way to eliminate such trends.
Figure 1 – Dow Jones Indices for October 2015
First-order differencing addresses linear trends, and employs the transformation zi = yi – yi-1. Second-order differencing addresses quadratic trends and employs a first-order difference of a first-order difference, namely zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – 2yi-1+ yi-2.
Taking first-order differences for the data in Figure 1 results in the chart on the right. The trend seems to have been eliminated.
ARIMA
An autoregressive integrated moving average (ARIMA) process (aka a Box-Jenkins process) adds differencing to an ARMA process. An ARMA(p,q) process with d-order differencing is called an ARIMA(p.d,q) process. Thus, for example, an ARIMA(2,1,0) process is an AR(2) process with first-order differencing.
It is important not to over-difference since this can cause you to use an incorrect model. Some rules-of-thumb indicating that you may have differenced too many times are:
- The autocorrelation of a differenced series is less than -.5
- Differencing increases the variance
Rules-of-thumb
An AR(p) or MA(q) process has a unit root if the sum of the non-constant coefficients is 1.
Additional rules-of-thumb:
- If an AR(p) process has a unit root then the level of differencing should be increased
- If an MA(q) process has a unit root then the level of differencing should be decreased
Assume that column A contains a time series of size n starting in cell A1. Now suppose we want to place the 1st order differences (of size n-1) in column B starting in cell B2, the 2nd order differences (of size n-2) in column C starting in cell C3, and so on with the 7th order differences (of size n-7) in column H starting in cell H8, then the formulas that need to be used in these starting cells are as follows:
- B2: A2-A1
- C3: A3-2*A2+A1
- D4: A4-3*A3+3*A2-A1
- E5: A5-4*A4+6*A3-4*A2+A1
- F6: A6-5*A5+10*A4-10*A3+5*A2-A1
- G7: A7-6*A6+15*A5-20*A4+15*A3-6*A2+A1
- H8: A8-7*A7+21*A6-35*A5+35*A4-21*A3+7*A2-A1
For each column, you need to highlight the range down to the nth cell in that column and press Ctrl-D to get the other values. Note that for the 7th order difference, the coefficients used are C(7,0) = 1, C(7,1) = 7, C(7,2) = 21, etc.
Worksheet Function
Real Statistics Function: The Real Statistics Resource Pack provides the following array function.
ADIFF(R1, d) – takes the time series in the n × 1 range R1 and outputs an n–d × 1 range containing the data in R1 differenced d times
Example 1: Find the 1st, 2nd, 3rd, and 4th differences for the data in column A of Figure 1.
Figure 1 – Differencing
Here cell B4 contains the formula =A5-A4, cell C4 contains the formula =B5-B4 (or A6-2*A5+A4), cell D4 contains the formula =C5-C4 (or A7-3*A6+3*A5-A4) and cell E4 contains the formula =D5-D4 (or A8-4*A7+6*A6-4*A5+A4).
Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). If we highlight the range G4:J22 and press Ctrl-R, we get the result shown on the right side of Figure 1.
References
Nau, R. (2020) Identifying the order of differencing in an ARIMA model
https://people.duke.edu/~rnau/411arim2.htm
Hello.
Thank you for this amazing explanation, it has really helped me. I wanted to find out on how one can conduct a test for stationarity using an equation such as this one: Yt = -0.48Yt-1 + Ut + 0.72Ut-1.
Thank you.
Hello Rebecca,
What is the relationship between Yt and Ut in this equation?
Charles
Or is Ut the error term?
Charles
” Thus, for example, an ARIMA(2,0,1) process is an AR(2) process with first-order differencing.”
I thought an ARIMA(2,0,1) process was an AR(2) and MA(1) process, and 0 order/degree of differencing is needed for the series to be stationary.
If you wrote “an AR(2) process with first-order differencing.” , then isn’t it just ARIMA(2,1,0) ???
Sorry I am a bit confused, please help. Thank you
Hello Eason,
Yes, you are correct. I used ARIMA(p,q,d) instead of ARIMA(p,d,q). I have now changed this on this webpage and a few other webpages. Thanks for catching this mistake.
Charles
Hi,
In your statement above, you mentioned:
“zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – yi-2”
Shouldn’t it be:
zi = yi – 2yi-1 + yi-2?
Kindly correct me if I’m wrong.
Thanks,
Jiawei
Jiawei,
You are correct. I have now corrected the webpage.
Thank you very much for identifying this error.
Charles
When i checked in excel for Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). It is retriving all values as 2.690575.
Values are not coming as below:
2.690575
-0.901482
1.212705
2.015852
1.209541
-0.197216
-1.046158
2.044927
0.89878
-0.92262
0.05102
0.64771
-0.73725
0.26199
-0.653847
1.513367
1.44655
0.75421
-1.17511
Venkat,
This is because ADIFF is an array function and so needs to be handled slightly differently from other functions. See the following webpage for an explanation:
Array Formulas and Functions
Charles
Thanks for your clear explanation. It is really helpful to me as a student desiring to fully understand the way of implementing the ARIMA model using toy data, without exploiting other commercial solvers such as STATA, R and so on. Because I wasn’t sure whether I properly formulated ARIMA models using obtained coefficients or not, thus, I looked forward to observing the detailed steps for forecasting something as presented in your website. Thank you !
Hi Charles, thanks for your tools and explain, I’m looking long for it. Some questions for differencing:
Your example is the N-order difference for trend data, if there’s a data set with trend and seasonal, I think may need to use K-step difference as well, but how can I identify the difference by using your ARIMA model ?
Chenny,
The webpage describes n-step differencing for ARIMA, but I have not yet included seasonality. I have described seasonality for linear regression and for Holt-Winter.
Charles