We explore various methods for forecasting (i.e. predicting) the next value(s) in a time series. A time series is a sequence of observations y1, …, yn. We usually think of the subscripts as representing evenly spaced time intervals (seconds, minutes, months, seasons, years, etc.).
 Topics
- Forecasting Accuracy
- Basic Forecasting Methods
- Stochastic Process
- Autoregressive Processes
- Moving Average Processes
- Autoregressive Moving Average Processes (ARMA)
- Autoregressive Integrated Moving Average Processes (ARIMA)
- Seasonal ARIMA (SARIMA)
- Miscellaneous Topics
References
Greene, W. H. (2002) Econometric analysis. 5th Ed. Prentice-Hall
https://www.scirp.org/(S(351jmbntvnsjt1aadkposzje))/reference/referencespapers.aspx?referenceid=1243286
Gujarati, D. & Porter, D. (2009)Â Basic econometrics. 5th Ed. McGraw Hill
http://www.uop.edu.pk/ocontents/gujarati_book.pdf
Hamilton, J. D. (1994) Time series analysis. Princeton University Press
https://press.princeton.edu/books/hardcover/9780691042893/time-series-analysis
Wooldridge, J. M. (2009) Introductory econometrics, a modern approach. 5th Ed. South-Western, Cegage Learning
https://cbpbu.ac.in/userfiles/file/2020/STUDY_MAT/ECO/2.pdf
Hi Charles,
I am trying to perform regression among time-series.
If I am not mistaken, I *must* difference them or subtract the time trend.
Is this included in the section?
I only want to find the coefficients but I hear the assumptions of regression
change a bit.Where(your site or book or link ) can one find a working example that includes assumption tests.
Thanx in advance,
Savvas
Hello Savvas,
There are two places on the Real Statistics website that are relevant to these issues: (1) Regression and (2) Time Series Analysis
Which specific place to look on the website depends on the specific model you choose to use.
(1) If you use regression, see the following webpages:
https://real-statistics.com/multiple-regression/autocorrelation/
https://real-statistics.com/multiple-regression/multiple-regression-analysis/seasonal-regression-forecasts/
(2) If you use Time Series Analysis, then there are lots of choices depending on the specific technique you use (Holt’s Linear Trend, ARIMA, etc.). Differencing is employed to obtain a stationary time series (which most techniquea require). See the following webpages:
https://real-statistics.com/time-series-analysis/stochastic-processes/stationary-process/
https://real-statistics.com/time-series-analysis/arima-processes/arima-differencing/
There are lots of techniques for time series analysis, and so I suggest that you choose from the menu on the following webpage:
https://real-statistics.com/time-series-analysis/
Charles
Ok these are a good start.I shall try them out.Just before I start a session I think I need to clarify one thing more.There is some bibliography (and I only have seen them in an econometrics book that points Wooldridge’s work) that states that if two time-series are correlated (such as GDP and UFO sightings) they might show high correlation (among them) for the wrong reason, namely their correlation to time.
The book mentioned that you can difference or detrend them but I do not have a complete example especially including assumptions(ie regression must be BLUE) that change a bit to handle such problems.Remember, I seek to do a simple regression to calculate coefficients but with that twist I mentioned
Are such topics covered in the links you provided?
Thanx charles.Your site rocks
Hi Charles
At present, I am working on QSR restaurant forecasting. So basically I have to forecast the quantity of each product required in each store.
Can you please help me with this
Regards,
chetan
Hi Chetan,
I am happy to answer questions that you have to help you create such a forecast.
Charles
Hello Charles, I stumbled across your website while I was searching for methodology on time-series analysis. In fact I have 3 years worth of electric power consumption data that I wish to use Statistical models to perform forecasting. I have a basic statistics background and no “machine learning” background. Will you be able to recommend me on how to best consume your material and I would like to attempt to apply the methods / models onto the sample data. Thank you very much,
Anson
Hello Anson,
I suggest that you start by graphing the 3 years worth of data. Look for any patterns: seasonality, increasing/decreasing trends, randomness, etc. Based on what you observe, you then need to choose a model (ARIMA, Holt-Winters, etc.).
Charles
Hello Charles, Thank you for the reply. The time-series data I have in hand is a power consumption data taken from a commercial building consisting of 2 measurements: the accumulated Energy Consumption (Watt Hours), and the Power Consumption (Watts) at the timestamp of the reading
The sample data looks like below:
Timestamp | Total Accumulated Energy (Watt Hours)| Total Power Consumed (Watts) |
Jan 01, 2019 01:02:00 AM | 415,457,280 | 32,683 |
Jan 02, 2019 01:04:00 AM | 415,629,888 | 25,982 |
…
Jan 31, 2019 01:02:00 AM | 424,123,538 | 31,857 |
Jan 31, 2019 01:04:00 AM | 424,345,242 | 28,735 |
====================================================
When I plot the time-series of the total accumulated energy over the period of 3 years – it is showing a linear upward trend. When I plot the time-series of the energy consumed on fixed units (i.e. days) it shows a seasonality of high energy consumption Monday thru Friday then a lower energy consumption over Saturday and Sunday. Also the energy consumption is low on Public Holidays. The same pattern exhibits over the plotting of Total Power Consumed.
====================================================
My questions are:
1. Given the specimens are taken un-evenly (i.e. sometimes per 2 minute, sometimes per 5 minutes, so on). Do I normalize the dataset to a common unit (i.e. Day), if the objective is to forecast the next day’s energy consumption?
2. Do I need to further roll-up and normalize the data to Months if I were to forecast the energy consumption in next month?
3. What do I have to do to enable a forecast of more days / weeks / months ahead?
4. How do I take into the considerations of public holidays / Weekdays / Weekend (i.e. low energy consumption). Would it imply a different handling if I were to forecast the next “Day”, “Week”, Month”?
Thank you very much,
Anson
Also Charles, I would like to ask if I wanted to forecast in the unit of days, but yet wanted to take into the account of seasonality over a year, what should I be following to deal with the leap year issue? Thanks, Anson
Hello Anson,
I would like to know if you have solved this problem, as in my case, the data is also distributed on a daily and annual basis. I’m not sure if SARIMA can predict data with such a large number of cycles.
Hello Anson,
I don’t know how to deal with time series with unequal time intervals. The following references might be helpful:
https://stats.stackexchange.com/questions/33796/is-there-any-gold-standard-for-modeling-irregularly-spaced-time-series
http://eckner.com/papers/unevenly_spaced_time_series_analysis.pdf
https://www.researchgate.net/publication/220907163_Statistical_Models_for_Unequally_Spaced_Time_Series
Charles
Thank you Charles, I am able to find a solution to construct my dataset (3 years) in the unit of Hours. May I know what I’d need to do to make forecast in the units of a larger unit such as Days, Weeks, or even month?
Thanks,
Anson
Hello Anson,
I would add 24 forecasted hours to make one day, and similarly for weeks and months. I don’t know whether this is the recommended approach, but it makes sense to me.
Charles
Hi Charles,
I have seen your website for long, thanks for your works on these. I found it very useful. I have question, I don’t know if it could be consider as time series analyses, because most of the discussion are on the financial part and forecast.
I have a random data, see below ( the data is every 0.1s and a wave data ), If I want to calculate the height of the wave on the raise and on the fall, and also the duration between the peaks, what techniques I should use ?
Thanks
Gunawan
See the data below
10.0 -0.031
10.1 -0.151
10.2 -0.266
10.3 -0.371
10.4 -0.464
10.5 -0.546
10.6 -0.620
10.7 -0.689
10.8 -0.758
10.9 -0.832
11.0 -0.912
11.1 -0.999
11.2 -1.091
11.3 -1.182
11.4 -1.267
11.5 -1.338
11.6 -1.391
11.7 -1.421
11.8 -1.425
11.9 -1.404
12.0 -1.359
12.1 -1.296
12.2 -1.219
12.3 -1.133
12.4 -1.042
12.5 -0.949
12.6 -0.855
12.7 -0.761
12.8 -0.665
12.9 -0.566
13.0 -0.464
13.1 -0.358
13.2 -0.250
13.3 -0.140
13.4 -0.034
13.5 0.068
13.6 0.162
13.7 0.246
13.8 0.322
13.9 0.388
14.0 0.449
14.1 0.506
14.2 0.562
14.3 0.618
14.4 0.675
14.5 0.732
14.6 0.785
14.7 0.832
14.8 0.869
14.9 0.893
15.0 0.902
15.1 0.896
15.2 0.878
15.3 0.852
15.4 0.821
15.5 0.792
15.6 0.767
15.7 0.748
15.8 0.737
15.9 0.730
16.0 0.722
16.1 0.708
16.2 0.682
16.3 0.638
16.4 0.575
16.5 0.491
16.6 0.391
16.7 0.279
16.8 0.164
16.9 0.054
17.0 -0.042
17.1 -0.117
17.2 -0.169
17.3 -0.195
17.4 -0.198
17.5 -0.183
17.6 -0.154
17.7 -0.120
17.8 -0.085
17.9 -0.055
18.0 -0.031
18.1 -0.015
18.2 -0.004
18.3 0.003
18.4 0.010
18.5 0.020
18.6 0.034
18.7 0.054
18.8 0.077
18.9 0.104
19.0 0.131
19.1 0.156
19.2 0.178
19.3 0.195
19.4 0.207
19.5 0.213
19.6 0.213
19.7 0.208
19.8 0.196
19.9 0.175
20.0 0.142
20.1 0.094
20.2 0.029
20.3 -0.055
20.4 -0.158
20.5 -0.278
20.6 -0.411
20.7 -0.550
20.8 -0.687
20.9 -0.816
21.0 -0.929
21.1 -1.020
21.2 -1.088
21.3 -1.132
21.4 -1.155
21.5 -1.164
21.6 -1.163
21.7 -1.159
21.8 -1.158
21.9 -1.160
22.0 -1.166
22.1 -1.172
22.2 -1.173
22.3 -1.161
22.4 -1.129
22.5 -1.072
22.6 -0.986
22.7 -0.870
22.8 -0.726
22.9 -0.561
23.0 -0.383
23.1 -0.198
23.2 -0.017
23.3 0.155
23.4 0.312
23.5 0.452
23.6 0.575
23.7 0.683
23.8 0.778
23.9 0.864
24.0 0.944
24.1 1.020
24.2 1.092
24.3 1.160
24.4 1.222
24.5 1.276
24.6 1.320
24.7 1.354
24.8 1.376
24.9 1.386
25.0 1.385
25.1 1.372
25.2 1.347
25.3 1.309
25.4 1.256
25.5 1.187
25.6 1.100
25.7 0.994
25.8 0.870
25.9 0.730
26.0 0.578
26.1 0.422
26.2 0.270
26.3 0.130
26.4 0.011
26.5 -0.083
26.6 -0.146
26.7 -0.181
26.8 -0.190
26.9 -0.179
27.0 -0.157
Hello Gunawan,
I am not sure that this is a statistics problem, but the first thing that I would recommend is that you graph your data. This can be done in Excel by using a Scatter Plot. See https://www.real-statistics.com/excel-environment/excel-charts/.
When I have done this, I see at least 3 upward peaks and 3 downward peaks. They seem to have different heights and different separation distances.
Charles
Thanks Charles, Yes, it is random wave elevation, and the data is quite massive and i need to get the height upward and height downward. on the data, It could be thousand of them, so I taught, I could do it with a time series analyses. Thanks for your times.
Hi Charles
I am very new to statistics. I wanted to do a multiple regression analysis to predict what drives crop expansion but I only have data for 12 years which is not a sufficient number of observations. Is there an alternative approach I can take to test the drivers?
TIA
Michelle
Michelle,
Perhaps 12 years of observations is not the best, but if it is all that you have then I would go with that. You can also look at the prediction interval that will give you some idea of the accuracy of the forecasts obtained.
Charles
Hi Charles,
All your work has been so helpful.
I am trying to make a Markov Regime Switching model about the stock market in excel.
Currently, I am using korean stock market index and trying to apply EM for parameters.
But I am already stuck in there.
Is there any advice for me?
I am a beginner and got no clue how to start it.
I know this is very excuse. But, I got no one to ask.
Thank you. Have a great day.
Sorry Taewoo, but I am not familiar with Markov Regime Switching.
Charles
It is alright!
every work of yours is so helpful.
Thank you.
Sorry that I am not able to help more, but I am swamped with work right now and am not familiar with the topic that you are looking for help for.
Charles
Can we do time series regression analysis for multiple parameters? E.g. forecast multiple y’s based on time/dates/months etc.?
Yes. See https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/arimax-model-and-forecast/
Charles
Hi, I have historical data which I believe don’t have a pattern. Can you suggest what method should I use to forecast? Thank you.
Hard to say without more information.
Charles
Hi Charles,
I want to work on time series dataset and as I am beginner, want to follow the step by step strategy to start this. I have started the work on simple monthly mean of Sunspot dataset (from the year 1749 to 2022) having only the attributes (Date and monthly mean) :
Date Monthly Mean Total Sunspot Number
1749-01-31 96.7
1749-02-28 104.3
1749-03-31 116.7
1749-04-30 92.8
1749-05-31 141.7
1749-06-30 139.2
1749-07-31 158
1749-08-31 110.5
1749-09-30 126.5
1749-10-31 125.8
1749-11-30 264.3
1749-12-31 142
1750-01-31 122.2
1750-02-28 126.5
1750-03-31 148.7
1750-04-30 147.2
1750-05-31 150
:
:
:
Can you please tell me that what methods are required in terms of both statistics and forecasting methods for this dataset?
Anjali,
There isn’t a simple answer to your question. There are many techniques for creating a forecast.
I suggest that you start by plotting the data to see whether there is a pattern. The pattern (or lack of a pattern) will suggest the approachs to use (or try).
Charles
Hi Charles,
Most of the Forecast model will consider Trend, seasonality, and Level. is there any other parameters or Factors that should consider if I am building a custom model for forecasting?
There are many options for creating forecasts. For the models that you are alluding to there is also “damping”.
There are other models, including ARIMA, SARIMA, etc.
Charles
Hi Charles,
Firstly, thanks so much for this resource it is greatly appreciated. I love Excel and it’s fantastic to see what you have been able to make it do.
I was wondering if you could advise a suitable method for my problem. I have quarterly fuel consumption data for a company, spanning the last 10 years, and want to predict the expected consumption by 2050. For example, the data might be:
Date, Consumption (kWh)
2010 Q1, 600000
2010 Q2, 550000
…, …
2020 Q1, 400000
There will of course be a lot of uncertainty in any prediction, but is there any regression method you recommend I use? I’m familiar with Simple/Multiple but understand in this context the assumptions one must make are not quite correct. I have been reading thoroughly into your Time-Series Analysis articles, and there seems to be a lot of methods but I’m struggling to pin-point the one I might need.
I really appreciate it.
Robert,
You should create a chart of your data to see whether there is some pattern (trend, seasonality, etc.). What approach to use depends on what you see. You can try Holt-Trend (or Holt-Winters) or ARIMA. All these approaches are described on the Real Statistics website.
Charles
Mr. Charles,
I’m fresh out of college and now working for a small business where I am the only data analyst here. As a result, I’m stuck with the data I’m working on right now that I believe time series can solve the problem. However, the prediction is not accurate. I’m not sure if you can take a look at what I did and guide me. I much appreciate your help.
I hope to hear from you soon,
Julia
Hello Julia,
If you email me an Excel file with your data and the analysis that you did, I will try to help you.
Charles
Hello Charles,I would like to know which model would be suitable for forecasting air passenger traffic post pandemic.
Hello Nandini,
I am not able to give a simple answer to this question. The model to use depends on a number of factors and a suitable response would require a lot more information.
Charles
Hi Charles, I am trying to create a model that forecasts the demand for a commodity. In this case Manganese. What model would you suggest to do this forecast?
I am not able to answer your question without additional information.
Charles
Hello Charles,
Can interrupted time series analysis be done on excel
I am sure that it can, but I don’t know with how much work.
In any case, currently, Real Statistics doesn’t support interrupted time series.
Charles
Hello,
I am analyzing animal data and I’ve never done time series analyses before. I have 12 animals, six in one group and six in another. I measured the time it takes them to get food over the course of six hours. I’m trying to see if Group 1 animals get to the food faster than Group 2 animals. What type of analysis would you recommend?
-May
Hello May,
If the data in each group are normally distributed, then you should be able to use a two independent sample t-test. No time series analysis os needed.
Charles
Hi Charles,
I will start work on estimating wait times in health care services using time series models. With which forecasting model you advise me to work (how to choose it), and do you have an example please.
Thank you in advance
There are many forecasting approaches. Many are described on this website along with examples. Which to choose depends on the details.
Charles
Hello Charles:
I have some raw pressure data which is very choppy. I’m trying to do an interference analysis to look for any offset activity disturbing the current system (which would show as a deflection in the “derivatives? (maybe)” of the original pressure curve being recorded.
However, with a choppy raw pressure curve, plotting derivatives of that raw pressure curve is out of the question. I found some methods called LOESS etc. (which is also listed in Wikipedia under their smoothing functions page) to smooth out the curve but still “maintain” integrity of the data. I did not find any of those smoothing functions in your page for excel. I went through every method on your page, and most methods predict a curve that lies under the original curve (in magnitude) or if they are in the range of the original curve, it is still choppy. Any suggestions? Thank you, Charles. Enjoying your website!
Hello,
There is the SPLINE function which uses a spline curve to connect the points. There is also the Kernel Density Estimation data analysis tool.
Charles
Hello Charles,
I am using your tool for validating one instrument. I am using the Forecast_Error function. Where is the meaning of each of the variables this function compute? Most of them are well known but I don´t find in your web the meaning of u1 and u2 and the formulas to compute them.
Thanks very much,
Gabriel Delgado
Hello Gabriel,
You will now find the definitions of these two statistics at:
https://real-statistics.com/time-series-analysis/forecasting-accuracy/time-series-forecast-error/
Thanks for your comment, especially since it identified that I had forgotten to include these definitions previously.
Charles
Thanks very much Charles,
I want to compare two devices that measure angular velocity (one the dispositive I want to validate and the other is the gold standard… They collect 128 samples per second so I have two signals almost perfectly syncronized. Which measure of validity do you recommend me? I am using RMSE…
Best regards,
Gabriel
Hello Gabriel,
I don’t know which is the best statistic, but RMSE seems reasonable.
There are also approaches such as Lin’s CCC and Bland-Altman.
Charles
Hello again Charles,
Thanks very much. I have also compute the Lin’s CCC and Bland Altman. One question. In my case residuals do not fit the normal distribution. I think in this case Bland Altman is not adequate. Can I simply plot the residuals to make an analysis of the bias?
Thanks very much,
Gabriel
Bland-Altman does require that the residuals be normally distributed, but if the residuals are not very skewed the results should generally be pretty good.
Charles
Hi Charles,
Thank you so much for your website! It is fantastic!
I’m curious as to why you seem to have skipped over Mincer-Zarnowitz in forecast evaluation? Any particular reason?
Cheers,
Hello Dario,
Glad you like the website
I have not included Mincer-Zarnowitz yet since it is not in the textbooks that I have consulted and no one has requested it.
I am adding capabilities all the time and will include this in the list of potential future enhancements.
Charles
Hi Charles
I’m missing something, I’m trying to use =MK_TEST(J26:J90, TRUE, 2, 0.05) and getting MK-stat as output but nothing else. Shouldn’t I get results like in Figure 2 at https://real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/
No worries, found the problem
what was the problem in this case, i’m also encountering the same problem.
Hi Simon,
MK_TEST is an array function and so it returns a range of cells, but you must use it in a slightly different way. This is a standard feature of Excel. See the following webpage for more details:
Array Formulas and Functions
Charles
Thank you for the fantastic work you have done with Time Series. I would really like to benefit from anything you can publish to help me understand the following:
1. Generalised Methods of Moments (GMM), when to use it etc
2. Tips about various methodologies concerning robustness checks in econometrics
3. Data handling in econometrics
4. Model transformation in the case of heteroscedaticity. The concern here is I understand it is the data that is transformed. So for instance if you have a cross section with two variables X and Y and Y is regressed on X. Assume there is heteroscedasticity. If the values of X are (3,5,6,8,9) and Y are (5,9,6,8,4). Please explain, using the data how the model Y=a +bX +e is transformed in this case.
Again thank you for what you do.
Regards
Abel,
I expect to expand the Time Series part of the website in the future and will take your comment into account.
In addition to this webpage, see the following webpage: https://real-statistics.com/multiple-regression/autocorrelation/
Charles
Hi Charles!
Have a quick question, I have three different matrices that have different time series (1938-1944, 1944-1953 and 1953-1965) and I am trying to do a log-linear analysis on it to make sure the results are comparable. Any advice on how to approach this?
Cheers,
Clinton
Clinton,
I am not sure I understand what you have in mind, but perhaps the following webpage can be helpful:
Log-linear models
Charles
Hi Charles,
I have little-bit confusion about Plotting Rolling Statistics can you please refer topic from this time series analysis. This is a stationary checking process as you know.
Ashish,
Sorry, but this topic has not yet been covered on the Real Statistics website.
Charles
Dear Mr Charles,
Is there any way to forecast cash outflow based on data time series. For example I’d like to make a projection of cash outflow in 2018 based on the time series data of cash disbursement from 2014-2017?
Thanks in advance.
Anang,
Yes, you can create such forecast using one of the techniques described on the webpage.
Charles
Hi Charles,
I use your RealStats Add-in for Excel. For school we usa a time-serie analysis book by Rob J Hyndman.
I was comparing the coefficients of RealsStats with the coefficients of ARIMA in RStudio. For RStudio, I use the ‘fpp2’ package by Rob J Hyndman.
With the exact same dataset the coefficients are different.
I was wondered why they are different. Is this because RealStats is using the solver at the background and is estimating the coefficients? Or is it because R uses different algorithms.
Also with models like ARIMA(1,1,1) the coefficients are almost the same as the coefficients in R. But with a model like ARIMA(3,1,3) the coefficients are very different.
Greatings,
John
John,
Thanks for identifying this. The Real Statistics add-in using two approaches for estimating the ARIMA coefficients, one via Solver and another iterative approach. In test examples, the estimates agreed with R.
Can you send me an Excel file with your data and the results you got from R? I will then try to figure what is going one.
Charles
Hi Charles,
Very nice blog.
I was wondering whether you could help me understand lag removal in time series analysis. I am dealing with a time series data that has multiple parameters. I understand that we need to remove lag before any modeling is performed.
Thanks
Adi
Adi,
You can find information about this topic in the various webpages listed on the reference webpage. In particular, you can start by looking at
https://real-statistics.com/time-series-analysis/stochastic-processes/stationary-process/
Charles
Hi Charles,
Have you published the time series analysis in a book.
Mohammed,
No I haven’t. I expect to publish the first of a series of books shortly. I plan to publish a book on time series analysis as well, but that won’t happen this year.
Charles