Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here
Hi Charles,
Thanks again for your great site.
I have a quick question: I have a data set of 8 data points and I want to test the prob of the 8th data point occurring given the previous 7. I model a ols linear fit to the 7 points and calculate the residuals and calculate their SE=45.26. I then project that linear model to point 8 and calculate its residual based on that linear fit resid=888. I then calculate the t stat as 19.6. I want magnitude difference so 2 tail and get p=1.13e-06 for df=6 (as I want to test prob given the distribution of the 7).
Is this how you would do it and do you’ve any comment on how I done the calculation?
Thanks,
Joe
I should clarify: I’m just not certain if the df=6 or 7 in this case as I haven’t included the 8th data point in the linear model estimation which only had 7 points. Residuals on 7 are less than 178 (4 less than 100) but residual on 8th is 888 which is much greater. I simply want to calculate its probability given the previous 7. I used your t test for it but cannot put in one data point for the ‘second’ dataset and have to put in 2 (repeating 8th point). Is this the right way to do it?
Also if you have any other suggestions of how to estimate the probability of the appearance of that 8th data point given the other 7?
Thanks,
Joe
Joe,
If I understand correctly, you only have one point, namely, point 8. You want to compare the observed value with the predicted value. Since you have only one point, it is difficult to test based on this one point. In fact, it is not clear to me what hypothesis you are trying to test. Perhaps the following webpage is relevant.
https://www.real-statistics.com/multiple-regression/confidence-and-prediction-intervals/
Charles
Hi Charles,
Thanks for the response. What I’m looking for is the probability of a data point as large as the 8th given the previous 7 that I have.
Yes, I think really what I need to use are the prediction intervals. I can calculate those for the 7 points and then see what the probability is for each of them to be as large as the 8th data point by reducing the alpha to meet the 8th data point. The overall probability then of any data point being as large as the 8th data point is that at least one of them is that large using the binomial theorem. That makes sense to me anyway!
Regards,
Joe
Joe,
I don’t know how to do something along the lines that you are suggesting.
Charles
Charles –
I expressed this to your personally via email but I wanted to share the feedback with the broader general public.
The resources, tools, and explanations you provide on this website are invaluable. As someone with a deep curiosity and immense passion to learn but without a formal training in statistics, you and your Real-Statistics website have opened up my eyes to a whole new world and have propelled me to previously unimaginable levels of statistical granularity and insight.
THANK YOU for all that you have done and continue to do. You have empowered me to pursue interests I once thought impossible. THANK YOU.
Andrew
Thank you very much, Andrew.
As I have expressed to you via email, it has been my pleasure to give something back after all that others have done for me over the years. I am pleased and honored by your kind words and support.
Charles
Dear Charles
It is great for your introduce and share.
And for time series forecast , the model of Holt-winters, I am not sure the level . As know that , three factor be included , there were Level , Trend and Seasonality. And the first number of Level , I saw that in the excel , calculate as Acutal – Seasonality
As Excel sample , 10 Q4 acutual is 25 , and seasonilty is 10.75 ,trend is 1.875 so level is 25-10.75=14.25, but why not consider the trend , why not level is 25-10.75-1.875 = 12.375?
Could you share the idea?
Thank you very much
Dear Gary,
Sorry, but I don’t understand your question. What example are you referencing?
Charles
Dear Charles,
This add-in is a powerful tool, so cool! But I got a problem…
I tried tens of times on the holt forecast but always run into an error. Winter forecast works. I also tried your example sheet, which holt forecast running went into an error too. My excel version is 365 on the Windows platform.
the set is {8000, 13000, 23000, 34000, 10000, 18000, 23000, 38000, 12000, 13000, 32000, 41000} # of seasons 4, #forecast seasons 4, alpha 0.1, beta 0.2, initialize trend yes
I found something interesting. if I output the result to any cell in the first column or a new sheet, it will crash. But if I output the result to any other column, it will work.
I have tried to use this data analysis tool on your data and had no problems.
Can you tell me what sort of error you received?
Charles
I have Excel ver 16.57 for Mac and it’s crashing every time I try to use your add-in, any idea?
Hello Rodrigo,
When does it crash? During installation or when you open Excel?
What sort of error message, if any, do you receive?
Have you looked at the Troubleshooting suggestion (on the webpage from where you downloaded the Real Statistics software)?
Charles
Dear Dr. Zaiontz,
I am unable to download the software from the links on your webpage – it opens another window with the same page.
Is there something I am not doing right?
Thanks,
Rajendra
Are you trying to download the software from the following webpage?
https://www.real-statistics.com/free-download/real-statistics-resource-pack/
Charles
Dear Sir,
Can I do Cochran-Armitage Test for Trend with Real Statistics Resource Pack? if so, could you please show me how to do it?
Thank you very much.
See https://www.real-statistics.com/chi-square-and-f-distributions/cochran-armitage-test/
Charles
Hello Dr Zaiontz
A simple note of appreciation for your spectacular work here. Easy to use, chock full of useful tools, and illustrative explanation and writing. Truly amazing, thanks again.
Mike Overturf
Thank you very much, Mike.
Charles
Hello
I have Windows 11 and use Excel 2019. I have downloaded the software from your webside for Windows and Excel 2019. If I want to run the add-in in Excel it warns with “Run time error 5”, so I cannot use it.
Best regards
Vico
Hello Vico,
I don’t know what would cause this error. You can google “Run time error 5” or look at the following webpage:
https://support.microsoft.com/it-it/help/966294/run-time-error-5-invalid-procedure-call-or-argument
I am using Windows 11 and Excel 365 and haven’t seen this error message.
Charles
Dear Charles
Is it possible to employ the Durbin-Watson test when a model only contains an intercept and no regressors?
Andrew,
Why would you want to use the Durbin-Watson test in the case where the regression line is y = c where c is a constant?
Charles
Hello Charles.
Yes, the model I am evaluating is: yi = C + ei
I performed a nonparametric runs test as a check for serial correlation and found none. I then considered evaluating positive serial correlation using the Durbin-Watson test by evaluating ei. I have never done so without a regressor but thought it was a curious exercise because I have never seen DW tables with no regressors.
I have auto regressed the errors using AR(1) and found that rho is insignificant via t-test. However, the reported Durbin-Watson value that software reports based on the AR(1) are based on the errors with the effect of rho removed, even if it is insignificant, this does not reflect the original errors where the constant is subtracted for the individual observations. As such, the Durbin-Watson value approaches 2 (ie. 1.90 to 1.97).
Thank you and best regards,
Andrew
Hi Dr.
Are there any function to figure out the best fit modeling parameters for ARIMA/SARIMA functions?
My question is: instead of trying (p,q,d) values and looking to the tests, does the system have any specific function for a best guess for (p,q and d)?
Appreciatte.
Hi Daniel,
There are various heuristics to determine potential parameters, but I don’t know of an approach for selecting the parameters.
Charles
Hi Charles,
When we run a post hoc test after an ANOVA or a non-parametric analysis, is it possible within Real-Statistics to automatically assign letters to different groups?
Thanks for your splendid work.
Hi Luis,
Currently, Real Statistics assigns the values Group 1, Group 2, etc. when no group headings are specified. Is there a problem with this?
You could add the labels A, B, etc. yourself.
Charles
Hi Charles,
I am not refering to the groups names, but to the assignment of different letters to groups that according to the test are diferent, as shown below
Group 1: a
Group 2: ab
Group 3: abc
Group 4: bc
Group 5: bcd
and so on.
Right now what we get is the information of all pairs of groups whose means are or are not different (with a cutoof p-value) as shown below
Group 1 versus Group 2, p= 0.06
Group1 vs Group 3, p= 0. 1
Group 1 vs Group 4, p= 0.02
Group 1 vs Group 5, p= 0.0001
and so on
Using letters is a more friendly way of presenting the results
Can you get what you want (or close to it) if you label the groups a, b, c, etc. instead of leaving the column headings blank?
Charles
Hi Charles,
No, It has nothing to do with the group names. What I am talking about is what is called “compact letter display” or cld. You can check this function on https://gegznav.github.io/biostat/reference/make_cld.html
Luis
Hi Luis,
Thanks for the clarification. I will consider doing this in a future enhancement.
Charles
Hi Charles, thank you for your previous help. I have another question:
I have a problem in which I have categorical marks of students in a questionnaire, but questions are evaluated differently Q1 from {0,..,3} Q2 from {0,…4} and so on. How can I address this issue using the Rasch Polytomous model?
Ciro,
I don’t know how to deal with this situation.
Perhaps you can multiply the scores for Q1 by 4 and the scores for Q2 by 3. This will make the scores on both questions go from 0 to 12.
Charles
Of course!
I will find the minimum common multiple of all maximum mark questions and new categories will appear. Every question will have many categories not appearing but that doesnt matter.
thanks.
Hi Charles
This website has helped me a lot. I just had to follow the implementation steps of “Building a Polytomous Model” and in a few hours, I had it done. I have a problem. the first iteration gives me the same answer in each table including the residual, but after that, the residual table starts oscillating, and the total residual start growing until it blows up. I might have a bug in my code, I wonder if it is possible to have the tables of the second iteration of this example, this could help me to find where the bug is. Thanks 🙂
If you email me the raw data, I will try to figure out what is going on.
Charles
Hi Thanks, I am sending the data by email.
It is just the same data of the example you give in ….building-a-polytomous-model… My code was done in python following the steps you show in the excel example
Thanks.
It’s ok, I found the bug 🙂
Dear Mr. Charles Zaiontz,
First of all, thank you so much for all the information and the amazing tools. I’ve really learnt a lot following your clear interpretations, and the specific examples provided in the website saved me from the original abstract and deep notions.
I just had a quick question confusing me, as I write my paper. That is, the mean difference between the two samples (one being 17 data points, the other 23 data points) are large, but the result of Monte Carlo simulation (resampling) turns out to be insignificant. I think this might to be related to the high variability within each sample’s data. What do you think about this? If this is the case, how should I argue that there exists high variability within the dataset, are there any formulas for me to use?
One more thing is that the resampling in the Real Statistics Using Excel software is designed for two independent samples. And I am wondering whether it could be improved so that it can deal with three or more independent samples.
So look forward to your reply.
Hello,
I am pleased that you are learning from the Real Statistics website.
1. It depends on how large is large. I would expect that resampling should give good results in this case.
2. There is resampling for ANOVA (3 or more independent samples). See
https://www.real-statistics.com/one-way-analysis-of-variance-anova/resampling-anova/
Charles
Hello Charles,
I’ve employed the resampling for ANOVA to see whether the means of three groups are significantly different. It turned out that p-value was significant.
You mentioned resampling can still be used for follow-up testing. In this case, when I want to figure out which two groups actually significantly differ, what kind of resampling should I use? The only idea I can think out is to do resampling with independent samples three times, but this is obviously improper. Could you elaborate more on resampling for follow-up testing?
Thank you for any advice.
WANG Minli
Sorry for the delayed response.
Depending on why you used bootstrapping instead of the standard ANOVA, you could simply use the usual Tukey’s HSD, or more likely, Games-Howell post-hoc test.
Multiple bootstrap versions of the t-test is another possibility, although it has its limitations.
There are various versions of post-hoc bootstrapping, but I am not familiar with these. The following papers may be useful:
https://www.sciencedirect.com/science/article/pii/S0022000013000731
http://mqala.co.za/veed/Introduction%20to%20Robust%20Estimation%20and%20Hypothesis%20Testing.pdf
http://sro.sussex.ac.uk/id/eprint/68206/3/Field%20%26%20Wilcox%20%282016%29%20robust%20estimation%202017.05.08%20%5Brevision%5D%20%281%29.pdf
Charles
Dear Mr. Charles Zaiontz,
I am not so expert in statistics.
I need some help with my data.
The reviewer suggested using two-way ANOVA for my data but the data is not normally distributed.
In this case, the reviewer suggested using the permutation variant of ANOVA.
Now I am not sure what exactly I need to do because it is not possible to do non-parametric two-way ANOVA in the graph pad prism which is available in my lab.
I would like to know which other test can I perform and convince the reviewer to accept my data?
and would it be possible to use Real Statistics Using Excel software?
I would really appreciate it if you can suggest something for this
Please reply,
Kind regards,
Ravi
Hello Ravi,
I am not sure which permutation variant of two-way ANOVA the reviewer is referring to. Perhaps this is a reference to ART ANOVA. You can perform this using the Real Statistics software
Aligned Rank Transform (ART) ANOVA
Charles
Dear Mr. Charles Zaiontz,
Thank you very much for your quick response.
However, I still have some difficulties and I am not getting the result which reviewers asked for.
I sending you an excel sheet and explaining my problem. I hope it is okay and you can help me.
Thank you,
With regards,
Ravi
Hi Charles! First of all thank you for this amazing tool. I am trying to run a Binary logistic regression and this seems to be the most popular tool to do so, but when I open my real stats dialogue box it doesn’t appear as an option. Is there any reason that may be? Thank you!
Hi Amanda,
It is the Logistic and Probit Regression option on the Reg tab.
I want to make you aware of a bug that is being fixed. I suggest you wait until Rel 7.9, which should be available tomorrow, that will fix this bug.
Charles
Dear Mr Zaiontz,
thank you very much for the helpful and instructive resources you provide.
I have a few question which I could not find an answer to regarding the correction of p-values for multiple testing after ANOVA followed by a PostHoc test.
Suppose I performed ANOVA on a dataset of 1000 parameters analyzed in three groups.
And I receive 1000 p-values indicating if any of these parameters show significant variance between any of my three groups.
Do I correct these for multiple testing at this point and use only the data which tested significant after correction for my chosen PostHoc test?
If this is true–>
Suppose I have 100 parameters with q-values below my threshold and I then test these data by multiple comparison of my three groups by way of three t-tests.
Do I use the complete list of 3 x 100 p-values to do a single correction for multiple testing after this or are the results of separate t-tests corrected individually.
Is the correction for multiple testing necessary/recommended after both tests?
Thank you and best regards, Christian
If you perform 1,000 tests, then you need to modify the significance level (alpha) and use this modified significance level when determining whether there are significant results. Typically Bonferroni’s modification is used. The revised significance level is alpha/1000. You divide by 1,000 and not by the number of tests that were significant at alpha = .05.
See https://www.real-statistics.com/hypothesis-testing/familywise-error/
Charles
Dear Mr. Charles Zaiontz,
In the article ‘WLS regression and heteroskedasticity’ the first sentence:
‘Suppose the variances of the residuals of a OLS regression are known, i.e. var(σi2) = εi. ‘
It seems to me it should look like this:
‘Suppose the variances of the residuals of a OLS regression are known, i.e. var(εi) = σi2. ‘
(given that the residuals are expressed by εi).
Just the opposite, isn’t it?
Thanks,
EMC
Hello Eduard,
Yes, you are correct. I have now made the change on the webpage.
Thank you for identifying this mistake. I appreciate your help in improving the accuracy and useability of the website.
Charles
Hello Charles,
I downloaded the tool pack add in correctly and was able to use it once. Now I can’t access at all. I went back through the original steps to access the Add in and it’s still showing. I unchecked and then checked the X-Stats box again but I still cannot access the tool. I need to take my final exam for Stats 2 in a few days so desperately need this.
Can you help?
Stephanie,
You should still be able to use the software unless something has changed. However, see the following the Troubleshooting section on the following webpage: https://www.real-statistics.com/free-download/real-statistics-resource-pack/
If none of this works, I suggest that you do the following:
1. Delete the file that contains the add-in that you downloaded previously.
2. Open Excel and then press Alt-TI and uncheck the XRealstats.xlam entry
3. Close Excel and then open Excel. Press Alt-TI and check the XRealstats.xlam entry. You should receive an error message suggesting that you remove this entry from the list of add-ins. Agree to do this.
4. Download a new copy of the add-in and install it as described on the webpage. Don’t try to open the file containing the add-in. To be absolutely sure that you won’t have a problem, before installing the add-in, rename it to ZRealstats-xlam (so that Excel won’t be confused with the previous version).
If you rename the add-in file, you can skip step #3.
Charles
Hello,
I have tested some of your functions for truncated normal distributions. I used the following parameters: μ = 11.1, σ =4.9, a=0, b=20. TNORM_DIST is run with TRUE, producing cumulative distribution as expected. TNORM_DIST also produces a gaussian like shape when plotted as expected.
However, I do not think the TNORM_INV is working correctly. In the table below I apply this function to the p values, the second column. I expected it to reproduce the x values… but as you can see it produced some other numbers. Can you please check what is going on.
x TNORM_DIST TNORM_DIST TNORM_INV
(TRUE) – p (FALSE) x
0 0.000000 0.006562 -2.265306
1 0.008278 0.010204 -2.061224
2 0.020867 0.015220 -1.857143
3 0.039233 0.021776 -1.653061
4 0.064938 0.029884 -1.448980
5 0.099453 0.039339 -1.244898
6 0.143912 0.049672 -1.040816
7 0.198852 0.060162 -0.836735
8 0.263984 0.069894 -0.632653
9 0.338059 0.077887 -0.428571
10 0.418880 0.083254 -0.224490
11 0.503477 0.085361 -0.020408
12 0.588425 0.083951 0.183673
13 0.670259 0.079196 0.387755
14 0.745888 0.071662 0.591837
15 0.812939 0.062200 0.795918
16 0.869971 0.051785 1.000000
17 0.916506 0.041355 1.204082
18 0.952935 0.031678 1.408163
19 0.980291 0.023276 1.612245
20 1.000000 0.016405 1.816327
Hello Robert,
Thank you for bringing this problem to my attention. There is an error in the TNORM_INV function. I have made the correction on the website and flagged that there is a problem. I will repair the error in the software shortly. In the meantime, you can use the following formula in place of TNORM_INV(p,m,s): =NORM.INV(p*(NORM.DIST(b,m,s)-NORM.DIST(a,m,s))+NORM.DIST(a,m,s),m,s)
I appreciate your help in improving the Real Statistics website and software.
Charles
Hi Charles,
Thank you very much for creating and sharing such a great addon. Could I have some advice on the addon? I seem to have encountered an issue with an older version of the Tukey-Kramer test (2003 Excel version). After entering the data, there is a column labeled ‘c’ that is greyed out which is required for the calculations to proceed. Image in link (https://photos.app.goo.gl/CWNYGxXzi7V19d8Y7). What is the ‘c’ value and how do I calculate it?
Hi James,
The c column contains the contrasts. For the Tukey-Kramer test, you need to place the number +1 in this column for one of the groups and -1 for another one of the groups. This will result in a comparison between these two groups.
You can do the same for other group comparisons.
Charles
Ah, so that’s what it was. Thank you for your clarification and assistance.
Hi Charles,
I hope you and your family are well and safe. I have sent you an email with a spreadsheet showing what seems to be incorrect results when using the ROOTS function to calculate the roots of a cubic function, whereas the CUBICROOTS function seems to give different (and apparently correct) results. Please let me know if there is anything wrong on my end or if there is indeed an issue with the ROOTS function. Thank you so much for your help!
Hello Alan,
Thanks for bringing this issue to my attention.
I confirm that ROOTS doesn’t yield the correct answer. I don’t know yet whether this is caused by poor initial values for r and s. I will be checking into it further. I may need to implement a different technique.
Charles
Hi Charles,
Thank you for the prompt reply. I found out that changing the number of iterations to between 100 and 200 (instead of the default 50) seems to yield the right result. Please let me know if you will make any changes. In the meantime I have changed the formulas to use the CubicRoots, which does not appear to have any issues.
Best,
Alan
Hi Alan,
That is good to hear. I feared that I would need to change the s and r values. Changing the default for the number of iterations is easier. I will look into this for the next release. If it doesn’t impact easier cases, then I will change the default to 200.
Charles
Hello!
I have downloaded the real statistics file per instructions several times but it does not show up when I open a blank Excel document and try to follow the directions to install in the proper folder. I tried all 3 of the suggested troubleshoot tips and also rebooted my laptop but it doesn’t want to work. I have Excel 2016. My email address is smcglothlin22@gmail.com. If you could assist, I would really appreciate it. I am a senior at Athens State University trying to pass Stats 2 and this would help me tremendously!
Hello Stephanie,
What do you see when you enter the formula =VER() in any cell?
What do you see when you press the key sequence Ctrl-m (i.e. hold the Control key down and press the m key)?
Charles
Dear Dr. Charles,
I am PhD Student and I need use Lin´s Concordance Correlation Coefficient for checking validity of one device.
I try to use LINCCC formula in excel with Real Statistic packages. But, I have a problem, when I use it only appear the correlation coefficient without upper and lower interval coefficient.
How I can see upper and lower confidence interval with this formula?
Thank you for your time,
Best regards,
Aitor.
Dear Aitor,
This is an array function and so you can’t simply press Enter. See
Array Formulas and Functions.
Charles
Hi Dr Charles,
Your resource pack has been a great help in my learning journey! Thank you for sharing with the community such a great tool!
Recently, I started getting an error message that goes “Compile error in hidden module: Analysis” (GOF has this issue too). Would you be able to advise what I could do? I tried reinstalling the resource pack but it did not work.
Thank you.
Regards,
May
See https://www.real-statistics.com/appendix/faqs/compile-error-in-hidden-module/
Charles
Hi Charles,
Thank you for this resource! I used it to run a single factor ANOVA + Tukey, and one of my p-values appears to be negative. I’m wondering if that’s an error, and if not, what does it mean?
Thank you!
Hi Meng Yang,
None of the p-values should be negative. The only exception is a very small value near zero, such as -2.34E-15. This is -.00000000000000234 and is essentially zero.
If you are getting another type of negative value, you can email me your results and I will try to figure out what has gone wrong.
Charles
Hello Zaiontz,
There is any tutorial about how to perform Giacomini-White test in excel?
Thanks.
Hello Bernardo,
Sorry, but I am not familiar with this test and don’t know of an Excel implementation.
Charles
I conducted a survey and carried out one-way ANOVA. Partial eta square for the analysis is coming out to be less than .01. Is there any way to increase it?
The sample groups for the analysis are unbalanced in terms of count. Can this also be the reason for lesser effect size? Is there a method to counter balance this?
Hello Divya,
An effect size of .01 is quite small. This should accurately reflect the results based on your data.
Do you any reason to believe that the result is wrong? Just because you are not happy with the result doesn’t mean that it is wrong.
How unbalanced are your groups’ sizes?
Charles
How come it ask a password when I try to open the add-in?
See Password Prompt
Charles
Hi,
I had a group of people rating the satifaction in terms of poor, average and excellent when a service is not provided and when a service is provided. I am wondering what test I can use to compare the result of the ordinal data (satifaction) between a service is not provided and a service is provided among the paired sample. Could I use McNemar Bowker Test for this?
Hi Ker,
As long as you are testing the hypothesis described at the following webpage, you should be ok.
https://www.real-statistics.com/non-parametric-tests/mcnemar-bowker-test/
Charles
Hello Charles
I am having problem finding the XRealStats function on Excel. Clt+m is not working. It worked once and then stopped working. It is still available in Addins. Please provide a solution.
Hello Sumi,
This is very strange, especially since you can still access the software from the AddIns ribbon.
What happens if you close Excel? Does Ctrl-m work after you reopen Excel?
Charles
Same thing is happening to me the Ctrl-m worked with me once and it is not working any more. is there an alternative? Trying to perform tukey test .
Thanks
What do you see when you enter the formula =VER() in any cell?
Charles
What do you see when you enter the formula =VER() in any cell?
Charles
Thank you Dr. Charles
Dear Dr. Charles,
The resource of your website is really great..
I am not well verse with complex statistics. I need your help in understanding statistical model scientist have used for their study. Can this be done in excel?
Reference is as below:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcla.10043
Thanks once again.
Best Regards,
Shailesh
Hi Shailesh,
Thanks for your kind words about the Real Statistics website.
The statistical model in this paper can be implemented in Excel, but it will take some effort. The various formulas can be expressed in Excel and the maximum likelihood procedure can probably be done using Excel’s Solver. The authors used Newton’s method, which can also be done in Excel, but with more effort. Many of the current analysis tools described on the Real Statistics website and implemented in the Real Statistics software use this procedure.
Charles
Dear Dr. Charles,
Thank you for your response.
I would like to make one for my self. Can you please help me in understanding this method? It seems complicated to me.
Best Regards,
Shailesh
It is not so easy to do this.
You can learn about Solver and Newton’s method at
Solver
Newton’s Method
An example of how to find parameters to maximize the likelihood function is given at
https://www.real-statistics.com/regression/exponential-regression-models/exponential-regression-newtons-method/
Charles
Dear Prof. Charles,
Thank you very much for these resources and knowledge sharing, it has been of great usefullness for my studies.
I´ve been trying to implement in Excel the HAC standard errors estimation proposed by Hodrick(1992) “Dividend Yields and Expected Stock Returns: Alternative Procedures for Inference and Measurement”, Review of Financial Studies, vol 5, no 3, 357-386. The method is supposelly more efficient than Newey&West and similar ones to deal with auto-correlation in overlapping periods time series, and has reached some acknowledgment in the finance field.
However, I have not succeded so far and so decided to reach you for consulting this. Do you know this method for HAC standard errors estimation? Would you have any thoughts on how to implement it? Using some of real statitics tools or other excel resources? Would it be any eventual possibility of including this in the resource pack on future updates?
My apologies for eventually disturbing you with this, any thought or idea of someone of your know-how would be very appreciatted.
Best regards, Andre.
Andre,
I am unfamiliar with Hodrick’s HAC and so wouldn’t know how to implement it in Excel.
Perhaps the following webpage would be helpful:
https://stats.stackexchange.com/questions/312341/comparison-between-newey-west-1987-and-hansen-hodrick-1980
Charles
Charles,
Suppose we were forecasting revenue in November, 2020. Classical decomposition time-series forecasts presumably would have forecasted that the Covid trend would continue. Instead, it would have made sense to use the classical method to capture seasonality and then apply the seasonal values to alternate economic scenarios.
Question 1: Is there a relatively easy way to do this with Excel’s forecasting functions?
Question 2: Would we be better off to calculate the seasonal factors using our own formulas and then apply the seasonal variations to each of the alternate scenarios?
Question 3: Would it have made sense to use historical performance in past Januaries to forecast January sales under alternate scenarios, and then do the same for the other months of the year?
Thanks!
Charley
Hello Charley,
Q1. Versions of Excel starting with Excel 2016 offer two approaches for creating time series forecasts with seasonality. The first is a Forecast Sheet capability available from the Data ribbon. The other approach uses functions such as FORECAST.ETS, FORECAST.ETS.SEASONALITY, etc. The Real Statistics website describes these functions at
https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/excel-2016-forecasting-functions/
These models are based on Triple Exponential Smoothing (aka Holt-Winters). If I remember correctly, Excel uses the additive model. There is also a multiplicative model. Both of these models are explained on the Real Statistics website. See
https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-winters-method/
https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-winters-additive/
The Real Statistics software provides data analysis tools to create forecasts using either approach.
Q2. If the functions and tools described in Q1 are sufficient for your purposes then it is better to use them. If not you can modify some other approach by adding seasonality. One approach to doing this is described at
https://www.real-statistics.com/multiple-regression/multiple-regression-analysis/seasonal-regression-forecasts/
Q3. Yes. If you have say, monthly data for a few years, then you would use this data as your time series and specify the period as 12. This is indeed the approach used by Holt-Winters and it takes care of any linear trend along with the seasonality.
Charles
Hi Charles,
Thank you so much for kindly providing all this useful tools. I have a quick question as I was recently testing out the spline function from the excel add-in, and compared it against the sci-py spline function (cubic spline). Despite using the same inputs, the outputs differed a fair bit. May I check if you happen to have some understanding as to why that might be the case? Is the logic behind the add-in different compared to the one provided in the python library from scipy?
Thank you in advance!
Joel,
I am not familiar with the version in the python library. If the python version is also a cubic spline function, then it is surprising to me that the results would be so dissimilar. I suggest that you create a plot/chart to see which seems to fit the data better and more smoothly.
Charles
Hi Charles,
Thank you so much for your prompt reply. I was actually able to reconcile the difference. For the benefit of other readers who might hit this issue, turns out that it is because the default setting for the python function is such that the first and second segment at a curve end are the same polynomial. Just gotta change the argument in the CubicSpline method to ‘natural’ so that the second derivative at curve ends are zero.
Charles, I’m doing an independent-samples t-test and am getting different output when using stacked format than when formatting data the usual way. I have the ages for 1785 participants in a treatment program (531 females & 1254 males). When I use the regular format for input data, Real-Statistics uses all observations for each sex. When I use stacked format for data, however, there are problems. The output’s reformatted data shows females in the left column (P4:P534) & males in the right column (Q4:Q1257), plus the label for each sex in row 3. The Summary table correctly counts 531 females, with the formula bar shows =COUNT(P4:P1257). However, the summary table incorrectly shows that there are also 531 males, with the formula bar showing =COUNT(Q4:Q534). It looks to me like Real-Statistics is switching the number of rows it should count for males & females. I looked in the sample workbook and the independent-samples t-test example using stacked format didn’t appear to have the same problem there. Thank you for your assistance.
You can ignore my previous message – I tried to do the t-test both ways on the same spreadsheet so I could email the file to you if you requested it, & excel crashed. When I re-opened it & re-ran the t-test both ways, the stacked format worked just fine. I have no idea what happened before, but it appears to be ok now.
Hello sir
I want to use your statistical tools to arrive at best possible method for my study by comparing different techniques of forecast in a supply chain scenario. I have read that MASE is a measure of forecasting accuracy. How to convert MAE into MASE. Can u explain with example ? Also is MASE applicable to arima??
See https://www.real-statistics.com/time-series-analysis/forecasting-accuracy/time-series-forecast-error/
Charles
Hi Charles,
I am not sure what I am experiencing is normal or something is not correct.
My operating environment is Windows 10 + Office 365.
I recently added Real Statistics following the procedure mentioned here.
I was able to use the Real Stat add-in then.
But then when I exit Excel and restart, the add-in is not active.
It looks the work around is to uncheck the add-in, and then check-in again.
It likely works.
Why the add-ins once added, do not show up when Excel is restarted?
Is it normal? Or anything else needed to fix this issue?
Regards, Saradhi.
Saradhi,
You say that “when I exit Excel and restart, the add-in is not active.” When you exit Excel and restart, does the formula =VER() still work?
Charles
I have the same problem. =VER() does not work when I first start the workbook. When I enter that in a cell I get #NAME. However, if I uncheck the RealStats add-in, hit OK, then recheck the add-in, I get “7.6.1 Excel 2010/2013/2016/2019/365 Windows.”
Perhaps the following FAQ can help (even though the topic is a little different):
https://www.real-statistics.com/appendix/faqs/disappearing-addins-ribbon/
Charles
Very helpful site. Helped me to calculate Newey West standard errors on Excel for analyzing the term structure of bond yields.
Hi Charles
Suppose you have a dependent variable T and set of 10 independent variables A, B, C… (all continuous variables). Trying to answer the question “¿Which of the regressors (A, B, C or D) have significant effects on T?” I build the following 10 models by linear regression:
Model I: T = a0 + a1*A
Model II: T = b0 + b1*B
Model III: T = c0 + c1*C
…
After that, I check homoscedasticity and normal distribution of errors (Breusch-Pagan and Shapiro-Wilk tests, doing that with your great Resource Pack!) and apply Ramsey’s RESET test for model specification errors (adding squares and/or cubes of T-hat). Finally, I score models taking into account the p-values of estimated coefficients for a1, b1, c1,… under Benjamini-Hochberg approach (with FDR = 0.05) .
a) Is the Benjamini-Hochberg approach valid under that circumstances?
b) If the goal of modeling is just the identification of relevant factors (I’m not looking for models with high predicting power), is the above procedure valid?
Any help with these questions will be greatly appreciated
Thanks
Gabriel
Gabriel,
The simplest approach seems to be to look at the model T = d + a1*A + b1B + c1C and see which of the coefficients are significant based on the p-value. I presume that you don’t want to use that approach, possibly because you want to remove the effect of correlations between A, B and C. With the approach that you are using, you have the problem of familywise error (three tests). You can use Benjamini-Hochberg, Bonferroni or a number of other approaches depending on which assumptions are acceptable. For more details, see
see https://www.real-statistics.com/hypothesis-testing/familywise-error/).
a) and b) As to whether Benjamini-Hochberg is acceptable, see
https://www.real-statistics.com/benjamini-hochberg-and-benjamini-yekutieli-tests/
Charles
Hi Mr. Zaiontz
Goodnight from me from other side of equator
Upon stumble into your website, as dive down to your website all reader can see your hard-work. All those time, effort, passion, and consistency poured for a result. An Amazing sanctuary for seeker of numbers to learn from a true master and growth as a better data analyst. So, Mr. Zaiontz me as a new born calf not afraid tiger (newbie) that set sail on stocks wave to seek a dime in number which statistical function is for my need to learn first?
Thanks in advance Mr. Zaiontz, lets hope and pray me not sink too soon in market.
Good to communicate with someone from the other side of the equator.
What are you trying to accomplish?
Charles
Me want to calculating risks if buy a stock with previous market data’s and predict best uptrend and downtrend. So i can make optimum yield estimation from several stocks invested.
Thank you for response.
This is a very broad area that people have written books about. I can’t possibly address this issue in a few sentences. THis is also not my area of expertise.
Charles
Hi Charles,
Thank you for your all of this really useful information which I am using for my Data Analyst apprenticeship at the moment.
I have been using the Logistical Regression calculation in the MS Excel add on but the results do not reflect any of the examples I have seen. My classification table has zeros in the top two Suc-Pred and in Fail-Pred i have 5666 and 13767. I have tried different variables and still get zeros returning so I am a little bit lost as to what I am doing wrong. I would be really grateful if you are able to share some insight into this for me.
Many Thanks
Hi Lee,
If you email me an Excel file with your data and test results, I will try to figure out what is going wrong.
Charles
Hi Charles,
I have just sent you an email.
Many Thanks,
Lee
Hola Charles,
¿ Cuál es el porcentaje de error de la herramienta?
Gracias.
I don’t know what percentage of error that you are referring to.
Charles
Me refiero al porcentaje de error de la herramienta Real Statistics, en la cual usa el método de componentes principales.
Saludos.
Eduardo,
I am not sure how the Percentage error for Principal Component analysis is defined, but it is probably not supported.
What definition of percentage error are you using? Do you have a reference?
Charles
Hi Charles
I’m modeling the behavior of a system by multiple linear regression where I suppose Y = b0 + b1.A + b2.T + b3.(T-squared), with Y, A and T as continuous variables. The regression using untransformed variables give high VIF values associated with T and T-squared. As a mean of reduce the collinearity, I try variable standardization on A and T, and repeat the regression assuming Y = b’0 + b’1.A’ + b’2.T’ + b’3.(T’-squared) [where ‘ indicates “standardized variable”]. As a consequence of standardization, the collinearity practically vanishes. Following your post about standardized coefficients in linear regressions (excellent post!) I try to write the corresponding regression equation based on unstandardized coefficients, but i can’t find the way to do that on equations involving quadratic terms like in Y = b’0 + b’1.A’ + b’2.T’ + b’3.(T’-squared). How to do that?
Thanks!
Gabriel
Gabriel,
It is not surprising that there is a high VIF between T and T^2. If you treat T and T^2 as two completely separate variables (ignoring the obvious relationship between them), then I would think that the approach described at https://www.real-statistics.com/multiple-regression/standardized-regression-coefficients/ would work fine.
You are probably better off ignoring the collinearity issue between T and T^2 since it should decrease stability but is not a fatal flaw (i.e. you can still calculate regression coefficients).
Charles
Hi Charles,
Thanks for your response to my previous question. I have another question regarding binned data. I have 26 data points, 1 @ 20.1, 1 @ 20.2, 3 @ 20.3, 14 @ 20.4 and 7 at 20.5
I tried to fit a distribution to the data but the p values were <.05. Can I use a Goodness of Fit statistic instead to select a distribution to the data?
Thanks again
Stan
Stan,
Are you saying that none of the common distributions is fit for your data? Are you using the chi-square test?
Charles
Dear Charles
I have been given the means and standard deviations from a stability study. I don’t have the original data but I do know that the data was normal. Is it possible to extrapolate the mean and standard deviation for a future time point?
Thanks
Stan
Stan,
I don’t see how you could since you don’t have any data to base it on.
Charles
Estimado Charles,
Estoy fascinado por su gran trabajo y labor para contribuir a la enseñanza de estos métodos de análisis, mi total admiración y respeto.
Soy egresado de la carrera de Ingeniería Civil y mi tesis se basa en un análisis de varianzas y regresión lineal múltiple. Consultando pude llegar a la conclusión que es un análisis de varianza factorial, pero leyendo y consultando más sobre su trabajo me ha quedado la duda si es factorial o multifactorial, le explico:
Tengo como variable independiente una característica muy importante para el suelo que es “módulo de elasticidad”, este se pudo obtener a través de ensayos, para ser exacto, ensayo triaxial para suelos, y como variables independientes tengo 11 características geo-mecánicas del suelo, por ejemplo: ángulo de fricción, cohesión, esfuerzo cortante, etc, que de igual manera fueron obtenidos a través de varios ensayos. Lo que pretende realizar a través de mi investigación es obtener variables independientes estadísticamente representativas a través del ANOVA para de esta manera usar las que no fueron descartadas en una regresión lineal múltiple, y así conseguir una ecuación de correlación que me permita obtener la variable dependiente (módulo de elasticidad) con las variables independientes estadísticamente representativas.
El procedimiento ya lo hice con la ayuda de RStudio y obtuve buenos resultados, pero ahora estoy intentando hacer lo mismo pero en Excel para entender más sobre como es que se realizó el análisis de varianza, cómo se obtuvo variables representativas y cómo se descartaron otras.
Si tuviera un tiempo para responder mi mensaje estaría eternamente agradecido ya que es muy importante para mi defensa de tesis.
O a su vez, si me podría recomendar bibliografía para continuar con mi investigación dónde pueda entender los tipos de ANOVA que hay, también me ayudaría mucho.
Saludos cordiales, Bryan Logacho Tapia
Quito, Ecuador
I don’t have enough information to comment about what sort of ANOVA you should use, but the Real Statistics website explains a number of ANOVA models that could be useful and how to execute them using Excel. See https://www.real-statistics.com/anova/
Charles
Estimado Charles,
Quiero saber si el Análisis de Componentes Principales (ACP) está basado en la Matriz de Correlación o en la Matriz de Covarianza?
Gracias.
See Principal Component Analysis
Factor Analysis
Charles