Basic Concepts
In order to create a stationary process, differencing may be necessary. For example, the graph of closing Dow Jones indices for October 2015 (see Example 1 of Stationary Process), as shown in Figure 1, clearly shows an increasing trend. Differencing is a way to eliminate such trends.
Figure 1 – Dow Jones Indices for October 2015
First-order differencing addresses linear trends, and employs the transformation zi = yi – yi-1. Second-order differencing addresses quadratic trends and employs a first-order difference of a first-order difference, namely zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – 2yi-1+ yi-2.
Taking first-order differences for the data in Figure 1 results in the chart on the right. The trend seems to have been eliminated.
ARIMA
An autoregressive integrated moving average (ARIMA) process (aka a Box-Jenkins process) adds differencing to an ARMA process. An ARMA(p,q) process with d-order differencing is called an ARIMA(p.d,q) process. Thus, for example, an ARIMA(2,1,0) process is an AR(2) process with first-order differencing.
It is important not to over-difference since this can cause you to use an incorrect model. Some rules-of-thumb indicating that you may have differenced too many times are:
- The autocorrelation of a differenced series is less than -.5
- Differencing increases the variance
Rules-of-thumb
An AR(p) or MA(q) process has a unit root if the sum of the non-constant coefficients is 1.
Additional rules-of-thumb:
- If an AR(p) process has a unit root then the level of differencing should be increased
- If an MA(q) process has a unit root then the level of differencing should be decreased
Assume that column A contains a time series of size n starting in cell A1. Now suppose we want to place the 1st order differences (of size n-1) in column B starting in cell B2, the 2nd order differences (of size n-2) in column C starting in cell C3, and so on with the 7th order differences (of size n-7) in column H starting in cell H8, then the formulas that need to be used in these starting cells are as follows:
- B2: A2-A1
- C3: A3-2*A2+A1
- D4: A4-3*A3+3*A2-A1
- E5: A5-4*A4+6*A3-4*A2+A1
- F6: A6-5*A5+10*A4-10*A3+5*A2-A1
- G7: A7-6*A6+15*A5-20*A4+15*A3-6*A2+A1
- H8: A8-7*A7+21*A6-35*A5+35*A4-21*A3+7*A2-A1
For each column, you need to highlight the range down to the nth cell in that column and press Ctrl-D to get the other values. Note that for the 7th order difference, the coefficients used are C(7,0) = 1, C(7,1) = 7, C(7,2) = 21, etc.
Worksheet Function
Real Statistics Function: The Real Statistics Resource Pack provides the following array function.
ADIFF(R1, d) – takes the time series in the n × 1 range R1 and outputs an n–d × 1 range containing the data in R1 differenced d times
Example 1: Find the 1st, 2nd, 3rd, and 4th differences for the data in column A of Figure 1.
Figure 1 – Differencing
Here cell B4 contains the formula =A5-A4, cell C4 contains the formula =B5-B4 (or A6-2*A5+A4), cell D4 contains the formula =C5-C4 (or A7-3*A6+3*A5-A4) and cell E4 contains the formula =D5-D4 (or A8-4*A7+6*A6-4*A5+A4).
Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). If we highlight the range G4:J22 and press Ctrl-R, we get the result shown on the right side of Figure 1.
Examples Workbook
Click here to download the Excel workbook with the examples described on this webpage.
References
Nau, R. (2020) Identifying the order of differencing in an ARIMA model
https://people.duke.edu/~rnau/411arim2.htm
Hello.
Thank you for this amazing explanation, it has really helped me. I wanted to find out on how one can conduct a test for stationarity using an equation such as this one: Yt = -0.48Yt-1 + Ut + 0.72Ut-1.
Thank you.
Hello Rebecca,
What is the relationship between Yt and Ut in this equation?
Charles
Or is Ut the error term?
Charles
” Thus, for example, an ARIMA(2,0,1) process is an AR(2) process with first-order differencing.”
I thought an ARIMA(2,0,1) process was an AR(2) and MA(1) process, and 0 order/degree of differencing is needed for the series to be stationary.
If you wrote “an AR(2) process with first-order differencing.” , then isn’t it just ARIMA(2,1,0) ???
Sorry I am a bit confused, please help. Thank you
Hello Eason,
Yes, you are correct. I used ARIMA(p,q,d) instead of ARIMA(p,d,q). I have now changed this on this webpage and a few other webpages. Thanks for catching this mistake.
Charles
Hi,
In your statement above, you mentioned:
“zi = (yi – yi-1) – (yi-1 – yi-2), which is equivalent to zi = yi – yi-2”
Shouldn’t it be:
zi = yi – 2yi-1 + yi-2?
Kindly correct me if I’m wrong.
Thanks,
Jiawei
Jiawei,
You are correct. I have now corrected the webpage.
Thank you very much for identifying this error.
Charles
When i checked in excel for Range G4:G22 contains the array formula =ADIFF($A$4:$A$23,G3). It is retriving all values as 2.690575.
Values are not coming as below:
2.690575
-0.901482
1.212705
2.015852
1.209541
-0.197216
-1.046158
2.044927
0.89878
-0.92262
0.05102
0.64771
-0.73725
0.26199
-0.653847
1.513367
1.44655
0.75421
-1.17511
Venkat,
This is because ADIFF is an array function and so needs to be handled slightly differently from other functions. See the following webpage for an explanation:
Array Formulas and Functions
Charles
Thanks for your clear explanation. It is really helpful to me as a student desiring to fully understand the way of implementing the ARIMA model using toy data, without exploiting other commercial solvers such as STATA, R and so on. Because I wasn’t sure whether I properly formulated ARIMA models using obtained coefficients or not, thus, I looked forward to observing the detailed steps for forecasting something as presented in your website. Thank you !
Hi Charles, thanks for your tools and explain, I’m looking long for it. Some questions for differencing:
Your example is the N-order difference for trend data, if there’s a data set with trend and seasonal, I think may need to use K-step difference as well, but how can I identify the difference by using your ARIMA model ?
Chenny,
The webpage describes n-step differencing for ARIMA, but I have not yet included seasonality. I have described seasonality for linear regression and for Holt-Winter.
Charles