Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here
I have just discovered your wonderful site. You have my eternal thanks and appreciation.
Thank you Leah,
Charles
thank you! i love your support! 🙂
dear sir,
i’ve a doubt about the conclusions of one sample wilcoxon test. after applying the test on my samples for significance level of 5%, i obtain the p vlaue 0.45 and corresponding z score is 0.123. If i’m not wrong, on the basis of obtained p value, there are very less chances to reject null hypothesis (kindly correct me if i’m wrong at any point). what else information i can conclude from my these results.
with regards
Nirbhow,
If p = .45, then you should not reject the null hypothesis.
Charles
dear sir,
thanks a lot for responding.
i want to further inquire from you that on the basis of p value and z score what conclusion i can draw about the nature of sample data i’ve used for test/ nature of technique i’ve used to obtain the sample data
with reards
That p-value = 0 means that it is highly unlikely that the real value for kappa is zero.
Charles
obtaining p value= 0.45, is a normal case or not. i’m asking this question because the magnitude of p value is high.
and what kind of information z score tells about the sample data
Nirbhow,
In what context are you referring to a z score?
Charles
respected sir,
i’m have applied one sample wilcoxon’s test to my samples and obtain p vlaue 0.45 and corresponding z score is 0.123. What is the qualitative information that this z score provides about my sample data.
The z score is used to calculate the p-value and effect size.
Charles
Charles
The Table above is displayed concerning the Abdi & Morin CV´s compared with those issued from my own routine (see ANNEXE)
Abdi & Morin 1E5 samples/size
.01
.05
Samples/
/ size
MY A&M
n .01 .05
50 .1450 .1245 4E6 .1457 .1246
100 .1036 .0889 .1026 .0888
150 .0850 .0729 .0840 .0727
200 .0738 .0633 .0729 .0630
250 .0662 .0567 1E6 .0652 .0564
300 .0605 .0519 4E5 .0596 .0515
400 .0525 .0450 .0516 .0447
500 .0471 .0404 .0462 .0400
Remember that it was proposed
f_(n )=(.83+n)/√n-.01
CV(.05)=0.895/f_n CV(.01)=1.035/f_n
[1] Hervé. Abdi, Paul .Molin, Lilliefors/Van Soest´s test of normality
_____________
The other confidence levels (20%, …) are completely useless, nobody uses that,
no guaranty in NHST.
__________________________
Conclusion
Owing to an error in the expression that transforms Z into p-value I wrongly assumed that the Abdi & Morin fitting expressions, never those evaluated by Monte Carlo, was strongly sub-evaluated. After revising my routine I could verify that the error was mine as it could be verified by the Table where my VC´s values were recalculated
I hope my apologies would be accepted.
Luis
Charles says:
January 29, 2017 at 11:54 am
Sure, it would be fine . . .
What you propose people should do in this instance?
I think that more people are needed to surpass the deadlock.
Do you think so, Charles?
luis
Charles
That´s the point . . .
How can I be judge and jury my own sake?
It was precisely the reason I invited you to recalculate the Critical
Values in order that you be sure who is right, Abdy or me.
Never mind, remember “Nemo index in causa sua”
Science always does depend on cross-validation, without it, crashes. . .
Luis
This is a particular e-mail
Charles
Now can I conciliate the Abdi & Molin
D(1000,.95)=0.0283 with my evaluation using 400´000 samples /size
n= 250, 0.0722
500, 0.0727
750, 0.0737
1000, 0.0742
One of us is deadly wrong, Abdi or me, for sure.
Do you agree?
Luis
Luis,
I agree, but it would be good to understand what are the right values to use.
Charles
Hi, Charles
Monte Carlo Method (MCM) and me
Since more than 4 decades that personal computers (PC) are currently available to everyone, I started to be concerned in its usefulness in Statistical matters. Because they aimed to control depending on the chance procedures and also to educationally interiorize randomness, MCM is of undisputable value in practical and learning issues. Here is an example.
We know from Theory that the extreme values, a matter of importance in several branches as Climate Changes, for example, are, for the Uniform (0,1) Distribution:
P(min≤x)=1-(1-x)^n
P(max>x)=1-x^n
Follows the results for samples sizes 20,30, 50, 1 million samples per size:
___n=20__/1´000´000
___min.___max_________observed__ (theory)_
__0.005__0.995_____0.0953__0.0952 (.0954)_
__0.050__0.950_____0.6416__0.6412 (.6415)_
__0.100__0.900_____0.8788__0.8786 (.8784)_
__0.200__0.800_____0.9884__0.9883 (.9885)_
__0.300__0.700_____0.9992__0.9992 (.9992)_
___n=30__/1´000´000
___min.___max_________observed__(theory)_
__0.001__0.999_____0.0296__0.0296 (.0296)_
__0.005__0.995_____0.1397__0.1397 (.1396)_
__0.050__0.950_____0.7854__0.7854 (.7854)_
__0.100__0.900_____0.9574__0.9578 (.9576)_
__0.200__0.800_____0.9987__0.9988 (.9988)_
___n=50__/1´000´000
___min.___max_________observed__(theory)_
__0.001__0.999_____0.0488__0.0488 (0.0488)_
__0.005__0.995_____0.2217__0.2217 (0.2217)_
__0.050__0.950_____0.9231__0.9229 (0.9231)_
__0.100__0.900_____0.9948__0.9950 (0.9948)_
__0.200__0.800_____1.0000__1.0000 (1.0000)_
Therefore, we reach a two-fold conclusion:
The Theory is correct as everybody knows,
And my RNG is quite acceptable quality. which was of course the true intent.
Luis
Hi Charles
I did obtain the Quantiles of the Geary Test (goodness of fit) concerning
Normal Data.
Is it possible to display the following Table in order to have feedback from
the Readers, to confirm that the values are acceptable for practical purposes?
1E7 simulated samples for each sample size.
Thank you.
0.80 0.85 0.90 0.95 0.98 0.99
n
20 .847 .855 .864 .878 .892 .901
25 .841 .848 .857 .869 .883 .891
30 .837 .813 .852 .863 .876 .884
35 .834 .840 .847 .858 .870 .878
40 .831 .837 .844 .854 .866 .873
45 .829 .834 .841 .851 .862 .869
50 .827 .832 .039 .848 .859 .865
55 .826 .831 .837 .846 .856 .862
60 .824 .829 .835 .844 .853 .860
65 .823 .828 .834 .842 .851 .857
70 .822 .827 .832 .840 .849 .855
75 .821 .826 .831 .839 .848 .853
80 .820 .825 .830 .838 .846 .852
90 .819 .823 .828 .835 .843 .849
100 .818 .822 .826 .833 .841 .846
120 .816 .819 .824 .830 .837 .842
140 .814 .818 .822 .828 .834 .839
160 .813 .816 .820 .826 .832 .836
180 .812 .815 .819 .824 .830 .834
200 .812 .814 .818 .823 .828 .832
Luis,
Ok, it is posted. Let’s see what feedback you get.
Charles
Hi Charles,
First of all, I would like to thank you for this wonderful website and codes. I started learning econometrics and your help if invaluable.
I have several case studies in the form of panel data. I was wondering if we can apply your logistic regression function to panel data?
Thanks for your reply.
Arnaud
Arnaud,
I am very pleased that you value the website and software.
I have not yet delved into panel data on the website and software. I hope to get to that in the future. The logistic regression capabilities described on the website and implemented in the software are only the beginnings of the approaches used with panel data.
Stay tuned, more to come.
Charles
Goodmorning Prof,
Am Jude,i want to do my thesis on generalized approach to the Analysis of Variance.Please sir,can you direct me on how to commence going about it?Thanks
Jude,
That is quite a broad request. For starters, please see the following webpage.
Analysis of Variance
Charles
Hello,
Would you please help me figure out the appropriate test. I have results from satisfaction surveys for 2 different companies. 40 surveys per company. I want to find out if the satisfaction is the same or different. Would I run a two sample t test with equal variances? Thanks for any guidance you can give me.
Lu,
You can run a two sample t test. If the variances of the two samples are relatively similar you can use the equal variances version of the test; otherwise you should use the unequal variances version of the test. You should also make sure that the assumptions for the t test are met; otherwise you might use a nonparametric test such as Mann-Whitney. See t Test for details.
Charles
Dr. Zaiontz,
Thank you. Your website is wonderful, as well as the Excel add-in. I believe I have correctly run a logistic regression on my relatively small data set (n ~140). One input, one output, pretest ~ course success (pass/fail).
My question is: Can you recommend how I can find someone to double check my interpretation of the output? I did read your website and posted a question and if I interpreted your data set correctly, then I think I am doing my interpretation correct.
Are there statisticians who will check my work affordably?
Amy,
I don-t have anyone to recommend. Perhaps someone reading your comment will have someone to recommend.
In the meantime, I will respond to your other comment shortly.
Charles
Thanks. I await your response. I think I got it, but just want to make sure.
I would like to use the program, but I have the 64 bit version. Is there a way to use the program anyway?
Mareike,
I have heard mixed things about whether the software works with the 64 bit version of Excel. I would try to use it and see whether it works or not. Please let me know what you find.
Charles
It opens only the file realstats.xlam, but no installation of the software. So I think it won`t work…
My fault, it works! And it`s a really great tool. Thank you!
Dear Dr. Zaiontz,
I just wanted to say thank you for sharing so much of your knowledge on your website. I am a math/stat major, and I have repeatedly returned to your website to get fuller, more intuitive explanations of the concepts in my stat textbooks, which usually throw formula’s at you without much explanation of their derivation.
A thousand thanks, Dr. Zaiontz!
Best regards,
James
James,
Thanks for your comment. I am very pleased that I was able to help you.
Charles
Dear Charles,
I am carrying out a research and want to analyse the effect of some meteorological parameters (temperature, relative humidity, evaporation, wind speed..) on river flow.
Can you kindly assist me and suggest the best statistical method for such an analysis.
Thank you
You need to provide more detail about the type of analysis that you want to do.
Charles
Hi again Charles,
Speaking of eVECTORS: the output of this array fn is a range of values, the first one being the eigenvalue and the rest being the coordinates of the (unit) eigenvector. Each time I use this function, I notice that the eigenvectors are listed in descending order with respect to the value of their corresponding eigenvalues. Is that a built-in feature of eVECTORS, such that I can trust that the output always will be organised that way (ie the highest eigenvalue first)? If so, that would simplify my calculations considerably.
Hi Dan,
The third parameter of the eVECTORS function describes the order in which the eigenvalues are listed.
If order is TRUE or omitted then the eigenvalues are listed in order from highest in absolute value to smallest. If order is FALSE then they are listed in order from highest to lowest.
Charles
Thanks Charles. That is great, since I won’t have to sort or rank the eigenvalues/vectors in a separate formula.
Hi Charles;
I’m getting an error message in trying to send you an email through your published addresses. I’m in need of guidance. I’m hoping you can help. I have a sample of passengers boarding a vehicle and how far they traveled. The samples were collected every 6th day for a year. I need to be able to prove (disprove) that these samples meet a minimum of 95% confidence; and minimum precision level of +/- 10%. How would I go about proving this.? There are so many tests, I get confused with them. Thank you for your help!
Catherine,
I am sorry, but I am sure what you mean by “samples meet a minimum of 95% confidence” or have a “minimum precision level of +/- 10%”.
Perhaps you mean that the population mean is the calculated sample mean with 95% confidence (?)
Charles
Can you reply to my email address and I can give you more detail?
Yes, I can try.
Charles
Going to send you an email from my personal account since our company email is migrating to the new microsoft 360 and experiencing many issues.
Bringing to light towards dismantling it, and it is our duty to do so…
Douglas H. Johnson
The Insignificance of Statistical Significance Testing (1999)
http://digitalcommons.unl.edu/usgsnpwTC/225
1. Are Null Hypothesis Really True?
“Most null hypotheses tested, however, state that some parameter equals zero, or some set of parameters are all equal. These hypotheses, called point null hypotheses, are almost invariably known to be false before any data are collected (Berkson 1938, Savage 1957, Johnson 1995)” [citation ended]
My comment
The parameter could very well be equal to zero and almost every time the observed value issued from data is different from zero: it is randomness acting. The A. is wrong, in fact we are completely unable by observation/experiment to state the true value of a parameter whatever. Using calculated Confidence Intervals for increasing sample sizes we obtain, progressively narrow intervals till a sufficiently low imprecision for practical intent is reached, in case of data homogeneity, and the test is strictly exact. On contrary with approximate test statistics it could very well happen that this final interval does not contain the true parameter value
Note: I do not care at all though the Null is true or untrue. My intent is to choose judiciously between it and the alternative. From the test value I can discard H0 with sufficient confidence or alternatively fail to do so when p-value is larger than alpha, a value I chose to wrongly reject a really true parameter value.
Luis A. Afonso
Hi Charles,
Thank you for these excel-lent (pun half-intended) stats tools. They work fine for me under MacOS 10 Yosemite apart from one major snag: execution of the functions take a very long time! Like half a minute or so. I seem to remember having similar problems with my own VBA macros in Excel until I put these code lines in at the start of every function:
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
and then ending with reversing them:
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Is there any way I can make the execution work faster without putting these lines into all the macros of RealStats? I guess there is something with the prefs of my Excel that brings this problem about, and that others don’t experience it. I would be very grateful for your comments. (btw, the RealStats VBA code i password-protected – do you distribute the password?
Dan B,
You won’t be able to make these changes yourself. I don’t distribute the password.
MY experience in the past was that the execution was relatively fast. Which function was this slow?
Charles
Hi Charles,
Thanks for the prompt reply. Since I posted my question, I have experimented with Excel preferences, and it works like this: if “Calculation” prefs are set to “Automatic”, then execution of the RealStats function “EVECTORS()” takes like 30 s. However, if I change the Excel Calculation prefs to “Manual”, I have to prompt the execution of all calculations using “Calc Now” or “Calc Sheet”, but the actual calculation is quite fast. I don’t know why this happens specifically to me (and presumably not others, since I imagine that most people have their prefs set to automatic calculation).
Dan,
Strange indeed. Since eVECTORS uses an iterative technique it could take some time to calculate for a large matrix, but in my experience to date, it is still pretty fast.
Charles
I don’t think it is specific to evectors, or even to RealStats. As I said, my own VBA macros behaved the same way, but there I could remedy it by putting in the code lines I mentioned earlier. So it’s probably not really a RealStats issue. I rather hoped you would know the solution. I guess I will have to search for the answer somewhere else or make do with manual calc.
Good morning Charles,
I am hoping that you might be able to help me out. I have a mac and successfully manged to download the add-in and get it to run. However, when I try to run my analysis, irrespective of whether it is a t-test of a repeated-measures ANOVA, I keep getting the same error message appear “compile error in hidden module: Analysis”. I have checked all of the fields and believe I have entered and selected everything correctly, but can not seem to get rid of this error. Any assistance would be very much appreciated!
Mel,
Are you able to use any of the functions? E.g. what do you see when you enter the formula VER() in any cell?
Charles
Hi Charles,
I get the same message too and I’m using a Mac as well. The version I’m using is excel 2011 and version 5.3.2 software.
Any assistance pls? Thanks!
Janine,
Which message are you referring to?
Charles
Informative article
I do not care, at all, if the Null Hypothesis is true or not: the NHST
measures in what measure H0 is unlike. If so p<alpha we reject it.
if not we fail to reject. Never say that you accept H0. In effect, we are
completely unable to state such a thing.
Gliner, Leech, Morgan [1] says as the major problem with NHST:
“ Kirk (1996) [2] went on to explain that was a trivial exercise because the Null Hypothesis is always false, and reject it is merely a matter of having enough power ”.
Our comment
The stated null hypothesis concerning a parameter, H0: p=p0 against Ha: p=p1 where p0 and p1 are real numbers, , even that the p0 exact value be practically impossible to be attained: I let out a laud laugh. When the real matter is to reject or fail to reject the null, I do not care at all this imprecision because with sufficient power (sample size) it always possible, given , to decide between the two hypothesis, so performing a rational choice. In fact the Confidence Interval (CI) once found is decisive: the test statistics inside, fail to reject H0, outside reject it. We know that a Type I error can occur, i.e. H0 is true, however we reject, or Type II, we fail to reject when H0 is untrue.
Luis
[1] – Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? Jeffrey A. Gliner, Nancy L. Leech, George A. Morgan. The Journal of Experimental Education, 2002, 7(1), 83-92.
[2] – Practical significance: A concept whose time has come: Kirk R.E. Educational and Psychological Measurement. 56, 746-759.
if there is very large variable like 35 how to decide which variable is important
which variable to include in model?
Rahul,
Are you speaking about regression?
Charles
yes
Charles
I am fully acquainted, you know, a set of people at war with NHST. However, items like the one I point out, borns the ridiculous.
Such as [1]
“ Because the proposed Null Hypothesis H0: p=p0, is practically impossible to be attained, therefore the Null Hypothesis is always false”. “If there is no expectation regarding the possible truth of Null Hypothesis its falsification by data is a redundant nonsense “. “If the probability of a point hypothesis is indeterminate, the empirical discovery that discovery that such a hypothesis is false is no discovery at all, and thus adds noting to what is already known”.
No doubt a pearl of great price . . .
The “author” makes confusion between“point estimation” and the practical impossibility to have certainty at the real probabilistic world.
No matter . . . I guess that the fathers of NHST will be terrified to have found such argumentation.
I am just starting not worry about . . .
[1]- Null Hypothesis Significance Testing (On the Survival of a Flowed Method), Joachim Krueger (January 2001, American Psychologist, Vol. 56, No.16-26).
Luis
Thank you for helpful to our study, Charles.
When I used QCRIT(x,y,z) in the tools, a compile error on lookup was occurred.
On the Excel 2010 version 14.0.7166.5000 (old one of my PCs), it was no problem, but on the version 14.0.7166.5000 (the latest of my PCs, it’s used for my study), it caused a compatibility problem.
How can I fix the error on latest Excel?
Or, if you can, could you construct the compatible latest edition?
Thank you for reading.
Shiggy,
The two Excel version numbers that you provided are identical.
Which release of Real Statistics are you using?
What are the values of x, y and z?
Charles
Thank you for your reply, Charles
Sorry, Excel version numbers are 14.0.7172.5000 and 14.7166.5000.
I use Real Statistics that downloaded on Jun. 2016 for Excel version 14.0.7172.5000, and for Excel version 14.0.7166.5000, I use old one that downloaded on Nov. 2014.
The values are x:6, y:12, z:0.05 for example, I can get the same error under the Real Statistics that downloaded on Jun. 2016 and Excel version 14.0.7172.5000.
Shiggy
Shiggy,
I don’t know why you are having this problem. I used the formula =QCRIT(6,12,0.05) in the latest release (Rel 4.9 of July 19) and get the answer 4.75. What do you see when you use the formula =VER()
Charles
Charles
I try the old version add-in (3.2.2) that I used for past study to Excel version 14.0.7172.5000.
In this case, I can get a correct value 4.75 by the formula =QCRIT(6,12,0.05).
However, about the latest version (4.9), it seems to fail to read in Excel.
So, for this time, I will use the version 3.2.2.
Because this problem it may be proper problem due to the software constitution of my PC, I will ask to engineer of the our university about this problem.
Thank you so much, Charles.
Shiggy
Charles
Would you so kind to send a comment, please?
(This is only because some Psychologists as J. Cohen and similar fauna, does insist in swap NHST from literature, which, IMO, is too radical . . .)
Thank you, so much
Luis
A reasonable meaning of H0: p=p0
a) In a real, probabilistic world, a no matter parameter when estimate through experimental/observational data, is by proper nature, errors caring, even the estimator is no biased,
b) Remembering, if so, the aim is to get sufficient evidence to falsify the Null Hypothesis,
c) Read, algebraically as an equation, H0: p=p0, leads directly to the absurd that there is no room for a difference even for a last decimal unit, and consequently the null hypothesis have not chance to be true.
Because what is intended to find out a significant result, i.e., if p is sufficiently far from p0 that, given the observed data, we can state that the opposite complementary hypothesis, Ha: p p0 is true. In short, I think that H0: p=p0 must be understood as indicating an approximate equality, not statistically discernible, given the data. The same thing for two parameter values H0: p1=p2.
Luis,
A lot of people have criticized the null hypothesis approach to statistical analysis, and with good reason. Other approaches have been proposed and are used (e.g. Bayesian approach), but the null hypothesis approach is still the one that is most commonly used and so I have adopted this approach for the website.
Charles
Hello,
I am analyzing wind speed data. I want to find weibull distribution of wind data. I have hourly averaged wind speed data for one year. I draw a graph for (Speed) vs ( Percentage of time). How can I find Alpha and Beta values of weibull distribution function. Thank You,
Chaminda,
If you know that the data follows a Weibull distribution, you can find the alpha and beta values that produces the minimum squared error (MSE) using Solver. This approach is illustrated in a number of places on the website. Using Solver for Logistic Regression, Using Solver for Exponential Regression and using Solver for Time Series Analysis.
Charles
can we do knn in real stats software
What is knn?
Charles
k – Nearest Neighbors for missing value
Rahul,
The Real Statistics software doesn’t yet support the knn algorithm. It does support the k-means clustering algorithm and Jenks Natural Breaks.
Charles
so how to do missing value imputation ?
Rahul,
Various missing data approaches are described on the webpage
Handling Missing Data
Charles
Hi, Charles.
Thanks again for this great tool.
Recently, when I open a new spreadsheet (also older files I’ve created using this Addin), the resource isn’t available because the macros have been disabled.
Is this something I did or an update I’m missing? I never use to have this problem. Do you have a fix? Like I said, I have older files that make use of the resource in a template that I hope to automate.
Thanks so much for your help.
Michael
Michael,
I am quite pleased that you like the tool.
I don’t know any reason why the macros would be disabled (assuming that you didn’t disable them).
If you have updated the Real Statistics software or moved the location of the software, then perhaps you need to tell an older file where to locate the add-in. You can do this exactly as described for the Real Statistics examples workbooks on the webpage:
https://real-statistics.com/free-download/examples-installation/
Charles
I liked the site for the clarity, rigor and explanation of the process.
This is an excellent site for learning.
I wish there were examples showing potential pitfalls of using an “apparently obvious” method and/or hints in each chapter indicating, “What to Watch For”.
Anand,
That is a good idea. I will consider adding this some time in the future.
Charles
Dear Charles,
I’d like to convey many thanks, once again, and many times more, if need be, for all your kind help.
Unfortunately, following brief use of Real stats in Win 8.1, French, Excel 2007, Data section would no longer open, and would insted CRASH Excel….
I can no longer install the add-in…. (be it 2003 or 2007 or some other one), in EITHER of the two files mentionned in other parts of this blog.
Ἔῤῥωσθε! == keep well, in good health (take care, etc…)
Georgios
Mr. Zaiontz,
I would appreciate in case you’ll find time to answer my question. I’m analyzing a big number of point-sets(I would call them attributes). Each of these attribute is in XYZ format. I’m doing it to measure the evaluation of the data therefore I want to analyse it through the flow where the data are changed. So I’m generating these attributes (XYZ, 90 attributes each time) for several stages during the changing of data.
For each attribute I have statistics summarized with the following parameters: minimum, maximum, median, mean, standard deviation, skewnes, kurtosis. I would like to create a pdf function in excel based on these parameters as an input. So although I have the data points itself I don’t want to load them into excel (as it would be too many data) I want to use these extracted statistics to plot PDFs. Do you think it possible? Could you advise the workflow for this?
Ilya,
Sorry, but I don’t know how you would be able to create a pdf based on the minimum, maximum, median, mean, standard deviation, skewness and kurtosis. This would give you the first four moments of the distribution, but not the moments after that. I would imagine you could make an estimate of a possible pdf, but I’m not sure how you would do that. Perhaps some other reader has an idea for how to do this.
Charles
Ok, thanks for the reply) so I would have to use my datapoints I have but not a statistics from it. Maybe Kernel Density Estimation would be a solution in my case but anyway I would have to use my points as input data.
This could be done using Pearson distributions. See https://en.wikipedia.org/wiki/Pearson_distribution for more examples, and see the R package PearsonDS (https://cran.r-project.org/web/packages/PearsonDS/) for an example of how this has been implemented in R. It should in theory be possible for this to be re-implemented within the Real Statistics package.
HI Charles, loving your website by the way.
I’m working on some time series data and using the TIme Series Testing tool; my question regards the output from the ADF test. The answer is most probably written somewhere, but I can’t find it…. under stationary (yes/no), aic, bic, there is a lags box. My data at diff 1 (with trend) comes back as stationary with lags 6, does this mean the data is stationary after 6 lags? I.e. Do I need to difference the original data 6 times before I can use it for other tests and comparisons?
Thanks in advance, I think I’m having a dumb day!
Lisa,
Please look at the following webpage and see whether this helps.
Augmented Dickey-Fuller Test.
Charles
Charles – great product that I use more and more – thanks so much for making and maintaining such a great product! I recently upgraded from Excel 2013 to Excel 2016, and now am having trouble making ‘Add-Ins’ appear on the ribbon. (Real Statistics is installed as an add-in and I can access it via Ctrl-m)
It’s not a big deal (since I can always use Ctrl-m), but I was wondering if this is just an issue with Excel 2016 or if I am not doing something right.
Thanks again,
Alvin
Alvin,
Sometimes the problem is the order in which the various addins need to be installed. I think the Analysis ToolPak needs to be installed first and then Solver and then Real Statistics. This may indeed be irrelevant to the problem you are having, but sometimes Excel is sensitive to these sorts of things.
Charles
I was looking for a non-parametric test equivalent to two-way ANOVA and thanks to your website I found how to conduct the Scheirer-Ray-Hare test.
So first, thanks a lot!!!
Second, I couldn’t find much information about this test, but I did find a comment saying it was not very robust… but which would be the alternatives, GLMs? I bet there are many cases out there where data with two factors do not follow normality, so I’m quite surprise not to find a clear pathway.
And third, if this test is perfectly valid, could I use any post-hoc, to check where are the differences within factors (e.g. my two factors are location and month, and the interaction is significant so I want to see in which months that happens). Any suggestions?
Thanks a lot in advance!
Just an additional question I had doubts of… If we have missing data, I see it didn’t work leaving the cell blank, or with NA, it only works introducing “0”. Is it considering it as a value?
Thanks again
Ara,
I don’t recommend using 0 for missing data. I suggest that you look at Handling Missing Data.
Charles
Ara,
It is true that Scheirer-Ray-Hare test is not very robust. You can use the various post-hoc tests, provided their assumptions are met.
Charles
Hi, I just downloaded this Add-in today and I tried to use the one sample t-test (as described from your website). However, every time I enter the data range and the correct settings, I get this response when I click OK: “”A run time error has occurred. The analysis tool will be aborted. Unable to set the HorizontalAlignment property of the Range class””.
What does this mean and how do I fix it so I can use the one sample t-test function?
Thanks!
Jane,
If you send me an Excel file with your data I will try to figure out what is going wrong. You can get my email address from the Contact Us webpage.
Charles
Dear Sir,
Currently I’m working on a schoolproject on forecasting the employee turnover of an organisation. To do this, I’m using a time series data of the employee turnover over the past 7 years. It is an annual data from 2009 to 2015. To make a good prediction it is also important to add some explanatory variable.
I want to use multiple regression with ARMA errors. But i’m not sure how to use these error to do some prediction. Do I need to make a prediction of each of the explanatory variable on it’s own first. Then add this predicted value in the model to do the prediction of the employee turnover?
Or will the model do alle the predictions of the dependent and independent variable and give me a predicted value of the dependent variable (turnover)?
Or do you recommend using another model to do the forecasting with all the explanatory variable?
I am really having a hard time putting all the variables in a model that gives me a good prediction. Also, the internal data has only a few observation, whereas the external data has more historical data? How can I combine these two data to do a multiple regression?
Is it statistically correct to use only 7 observations to do some forecasting?
Can u please help me out sir.
Thanks very much in advance
SB,
At present I only support time series for one variable. See Time Series Analysis.
I will eventually add multivariate time series capabilities.
Regarding your last question, you could make a forecast based on limited data; of course, the confidence interval of the prediction will be larger.
Charles
hi doc,
I must say am inspired and helped in the write up on statistics. I may say am unfamiliar with statistics and am doing a research which involves statistics..yes I have used simple excel now am finding difficult in analyzing the p-value …and do a scientic analysis. I have sent u my results of the p valus in your email. thanks in anticipation
Blaise,
I have responded to your email.
Charles
Hi Charles,
Appreciate your contributions to the statistics community.
I am trying to regress roughly 150 dependent variables (150 mutual fund’s time series returns) against roughly 8 independent variables (the time series of various factors like the return from simply buying cheap stocks, or stocks that have gone up recently, etc.). Thoughts on ways to do this without drinking heavily?
Drinking heavily seems like a pretty good option to me.
Charles
I don’t have the ability to interpolate on my Excel (Mac version) and have the following values. My W is 0.984 ,n = 30 and my p-values in between 0.9 (.983) and 0.95 (.985). I don’t have the interpolating feature on Excel and saw the equation you had on that page was a little confused as to what went where to calculate it correctly. Some help would be much appreciated 🙂
Thanks!
Sarah,
I don’t know which page you are referring to. Shapiro-Wilk? In any case, please look at the following webpage
https://real-statistics.com/statistics-tables/interpolation/
Charles
I saw this page and that was my question in regards to the equation. Where you have .522 I wasn’t sure which of my variables went there in order to get that calculation to go.
Thanks
Sarah, sorry but I don’t know which webpage you are referring to when you say .522.
Charles
The equation thats on the interpolation page you posted the link for.
Sarah,
Thanks. Is everything clear now?
Charles
Hi,
I am doing a research about GPA’s at my school. In my data sheet I have 50 samples, under each label I have less than 30 samples. I tried to use “t-test: Two-Sample Assuming Unequal Variances” because I don’t have the exact variances but I also know that I cannot use t-test if I have more than 30 samples. Since I don’t know the population variances I cannot use the z-test, too. Which test should I use in this case?
Thank you in advance.
Nur,
Are you saying that you have 50 samples and each sample has fewer than 30 elements?
The following statement that you made “I also know that I cannot use t-test if I have more than 30 samples” is not true. You can use the t-test for samples with more than 30 elements as well as for samples with fewer than 30 elements. You also don’t need to know the variances to use this test. I suggest that you look at the following webpage, especially the subpages about hypothesis testing
t Distribution
Charles
Hi, I wanna say thank you for the add-in functions you provided for us!
However, today when I’m using function nchisq_dist(9629.82651550155,0.06365952,9414.72191791204,TRUE), it returns 0 which seems incorrect. I input the same number in matlab to verify it while matlab returns 0.8659. Since I don’t have the password for that add-in, so could u plz help me out?
Grant,
Yes, there are limitations to the use of the NCHISQ_DIST function that I have provided. In Excel CHISQ.DIST generates an error value when df < 1. Since NCHISQ_DIST uses the Excel function CHISQ.DIST, it is not surprising that it does not return the correct value when df < 1. There are also limitations when x or lambda are large. I will try to explain these limitations on the website. Thanks for identifying this problem. Charles
Hello Charles!
I’m trying to compare two population proportions to see if there is a significant difference between the two proportions. I’m reading everywhere that I have to use the z-score to do so, but I can’t seem to find any good information on the quickest way to do this in excel. My goal would be to have a p-value in the end, telling me if the difference between the two population proportions is significant.
Could you help me with this? Is there a page on your site on which this is explained?
Thank you so much! Your website is very helpful!
Isabel
Hello Isabel,
Yes, please look at the following webpage.
Proportion Distribution
Charles
Dear sir,
Let’s suppose that someone uses a questionnaire with questions like this “How much do you think that X can influnce your life?” and the participants have to answer by choosing a number from 1 to 10. Could median be a meaningful measure in a case like this? Or the researcher should just stick to the mode?
Thank you very much for your time.
Maria,
This all depends on the details of what you are trying to analyze. Usually the mean is used, sometimes the median, and only rarely is the mode used. But which you should use really depends on your specific research. It also depends on the test that you use. E.g. if you decide you want to analyze the data with the t test (assuming that this makes sense for your analysis and that the assumptions for the t test are met), then you probably want to focus on the mean. If it turns out that the assumptions for the t test are not met, then it would be common to use the median (and some nonparametric test) to do the analysis.
Charles
Let’s say that the researcher wants to explore the ideas of some people about how environmental problems influence their own lives. So, he has a questionnaire with e.g. 10 questions. Each question has the same form (“How much do you think that X can influnce your life?”) and the same response options. These options range from 1 to 10, where 1 means “no influence” and 10 means “great influence”. So, 1-10 are not “real” numbers; theyare just the participants’ way to express what they think.
In this case, when the researcher knows that e.g. 5 participants have ticked the option “1” and 10 ticked “2” …. etc., and does not plan to run any test at all, is there a real meaning for him to estimate the median? I mean, if he just wants to describe this kind of data, is median really applicable and meaningful?
Thank you very much once more.
Good day Dr
First of all I would like to express my appreciation for the invaluable real statistic package which have help a lot of people already.
However the subject of the matter is that I am carrying out a research on “The viability of crowdfunding to meet the funding needs of SMEs” and my main focus is on the supply side of funds, the investors. I am using a questionnaire to gather the data of the investors and the online questionnaire is on the following link https://docs.google.com/forms/d/1pibVVwzBqyrL4SQ9mcmNzCiArwzXp0VKWcy8J_vJe2Y/viewform
After gathering the data and test on the viability, i decided to consider whether the potential investors would chose to invest or not(the propensity to invest). This would constitute my dependent variable. The independent variables include, investment capacity, which is a function of income and investment pledge, expected return surplus, payback period, there are other variables that i have considered such as internet connectivity, social media presence, access to a payment system as variable of consideration of crowd-funding etc
I thought the logistic model was going to be most appropriate model to use considering the binary nature of most of my data and i chose it.
however i have been following your video that you shared on YouTube on logistic regression model, after coding my data. I did run the analysis but i got errors in excel about the data format.
May you please help in how i can address the error issue, where i should look at as well. I WOULD BE VERY, VERY HAPPY if you assist on the overall project by offering comments, insights and direction as well. I believe this project is of great importance to the SMEs in my home country and the nation as a whole and i would like it to be as informative as possible.
Accompanying this mail is the Excel sheet with the responses and the data that i am trying to analyse.
I am in much anticipation of your response
Desmond,
I have received your email and will send you my comments shortly.
Charles
Ok Doc will be be waiting for your response
I have now sent you a response to your email.
Charles
Hi,
Do you know if its possible to do the exponential decay 3rd order in excel (y=yo+a*exp(-bx), fits for yo, a and b? I know I can do this in Origin or Sigmaplot, but it would make it easier if I could also integrate this on my excel sheet for the calculations I need.
Thanks!
Lis,
See the following webpage
Exponential Regression
Charles
Hi, Dr. Zaiontz!
Have you given any thought to doing more with the Fast Fourier Transform (FFT) already provided in Excel?
They really have not explained what to do with it after they generate the complex form. I have managed to compute the resulting magnitude (amplitude) and frequency from the complex form given but need help on how to interpret what I have…presumably the frequency spectrum. Ok, so I locate the dominant frequency after I graph the magnitude and frequency….now what? How do I go about combining the various frequencies to get a better fit and how do I come up with the parameters for the various cos and sin terms? Is there a short cut without having to construct it individually for each frequency?
Puzzled?
Sorry Rod, but I haven’t tried to work with the Fast Fourier Transform capabilities in Excel.
Charles
I recently consulted with a statistician that informed me, I need to use Generalized Estimating Equations (GEE) on the data that I’ve been gathering.
Does this excel add-in allow for GEE calculations?
Sorry Mike, but the Real Statistics website and software doesn’t yet support GEE. This will be supported shortly.
Charles
Thanks for the free software, it is appreciated, from someone who enjoys statistics, and occasionally needs to do analysis
Prof,
I have an dependent variable and four independent variables.
To do regression analysis using excel 2013, should I select all the FOUR independent variable together under Input X range, or I should do regression of the dependent variable against each independent variable, thus in all, four times?
Thanks
Tom
Tom,
It all depends on what you want to accomplish, but I suggest that you perform the regression with all four independent variables and see which ones are not significant. These are candidates for elimination from the model.
Charles
I want to determine whether two sets of data (n = 12, for both) are statistically significantly different at P <=0.001 (two-tailed test). Should I run Mann-Whitney U test on the raw data, or should I first do a normality check and convert the data to a normal distribution? (say, by log, square-root, or exponential conversion).
Craig,
If the data is normally distributed then generally you should use the t test instead of Mann-Whitney. You should first check to see whether the data in each sample is normally distributed. If the departure from normality is not too severe then you can use the t test. Even relatively symmetric data should be ok. If the data is very skewed then you can use the Mann-Whitney test provided the distributions of the two samples are relatively similar (even if skewed).
Charles
Very helpful — thanks! Follow-up question: If the data ARE skewed (i.e., not normally distributed) and I use Mann-Whitney, would I use it on the raw data, or on the data that are transformed by whichever transformation brings their distribution closest to normality?
Hello Charles,
I am having trouble finding a relatively recent academic reference to support the assertion that ‘MANOVA is not very sensitive to violations of multivariate normality provided there aren’t any (or at least many) outliers’. I have a large sample (n=955) with no outliers, but the assumption of multivariate normality is violated. I can proceed with the analyses if I can cite evidence that the MANOVA will remain robust. Could you recommend a citation?
Kind regards,
Sharon
Sharon,
This is not to say that if the data is really skewed that you couldn’t have problems, but generally normality is not a problem. Some references are:
http://homepages.inf.ed.ac.uk/bwebb/statistics/MANOVA2.pdf
http://documents.software.dell.com/Statistics/Textbook/ANOVA-MANOVA#deviation
http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt
Charles
Charles
Sharon,
Here is one such reference.
http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt
See slide 11.
Charles