Contact Us

Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.

In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.

Charles Zaiontz

My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here

1,124 thoughts on “Contact Us”

Leah Bloomfield

March 16, 2017 at 8:01 am

I have just discovered your wonderful site. You have my eternal thanks and appreciation.
Reply
- Charles
  
  March 16, 2017 at 11:59 am
  
  Thank you Leah,
  Charles
  Reply
  - Anton Hallor
    
    March 16, 2017 at 12:19 pm
    
    thank you! i love your support! 🙂
    Reply
Nirbhow

March 11, 2017 at 4:54 pm

dear sir,
i’ve a doubt about the conclusions of one sample wilcoxon test. after applying the test on my samples for significance level of 5%, i obtain the p vlaue 0.45 and corresponding z score is 0.123. If i’m not wrong, on the basis of obtained p value, there are very less chances to reject null hypothesis (kindly correct me if i’m wrong at any point). what else information i can conclude from my these results.

with regards
Reply
- Charles
  
  March 11, 2017 at 5:23 pm
  
  Nirbhow,
  If p = .45, then you should not reject the null hypothesis.
  Charles
  Reply
  - Nirbhow
    
    March 12, 2017 at 10:53 am
    
    dear sir,
    thanks a lot for responding.
    i want to further inquire from you that on the basis of p value and z score what conclusion i can draw about the nature of sample data i’ve used for test/ nature of technique i’ve used to obtain the sample data
    with reards
    Reply
    - Charles
      
      March 12, 2017 at 11:28 am
      
      That p-value = 0 means that it is highly unlikely that the real value for kappa is zero.
      Charles
      Reply
      - Nirbhow
        
        March 12, 2017 at 5:16 pm
        
        obtaining p value= 0.45, is a normal case or not. i’m asking this question because the magnitude of p value is high.
      - Nirbhow
        
        March 14, 2017 at 7:48 am
        
        and what kind of information z score tells about the sample data
      - Charles
        
        March 14, 2017 at 6:58 pm
        
        Nirbhow,
        In what context are you referring to a z score?
        Charles
- Nirbhow
  
  March 15, 2017 at 5:23 am
  
  respected sir,
  i’m have applied one sample wilcoxon’s test to my samples and obtain p vlaue 0.45 and corresponding z score is 0.123. What is the qualitative information that this z score provides about my sample data.
  Reply
  - Charles
    
    March 15, 2017 at 5:55 am
    
    The z score is used to calculate the p-value and effect size.
    Charles
    Reply
Luis A. Afonso

February 18, 2017 at 12:20 pm

Charles

The Table above is displayed concerning the Abdi & Morin CV´s compared with those issued from my own routine (see ANNEXE)

Abdi & Morin 1E5 samples/size
.01
.05
Samples/
/ size
MY A&M
n .01 .05
50 .1450 .1245 4E6 .1457 .1246
100 .1036 .0889 .1026 .0888
150 .0850 .0729 .0840 .0727
200 .0738 .0633 .0729 .0630
250 .0662 .0567 1E6 .0652 .0564
300 .0605 .0519 4E5 .0596 .0515
400 .0525 .0450 .0516 .0447
500 .0471 .0404 .0462 .0400

Remember that it was proposed
f_(n )=(.83+n)/√n-.01

CV(.05)=0.895/f_n CV(.01)=1.035/f_n

[1] Hervé. Abdi, Paul .Molin, Lilliefors/Van Soest´s test of normality
_____________
The other confidence levels (20%, …) are completely useless, nobody uses that,
no guaranty in NHST.
__________________________
Conclusion

Owing to an error in the expression that transforms Z into p-value I wrongly assumed that the Abdi & Morin fitting expressions, never those evaluated by Monte Carlo, was strongly sub-evaluated. After revising my routine I could verify that the error was mine as it could be verified by the Table where my VC´s values were recalculated
I hope my apologies would be accepted.

Luis
Reply
Luis A. Afonso

January 29, 2017 at 9:05 pm

Charles says:
January 29, 2017 at 11:54 am

Sure, it would be fine . . .
What you propose people should do in this instance?
I think that more people are needed to surpass the deadlock.
Do you think so, Charles?

luis
Reply
Luis A. Afonso

January 29, 2017 at 3:26 pm

Charles

That´s the point . . .
How can I be judge and jury my own sake?
It was precisely the reason I invited you to recalculate the Critical
Values in order that you be sure who is right, Abdy or me.
Never mind, remember “Nemo index in causa sua”
Science always does depend on cross-validation, without it, crashes. . .

Luis
Reply
Luis A. Afonso

January 28, 2017 at 9:14 pm

This is a particular e-mail

Charles

Now can I conciliate the Abdi & Molin
D(1000,.95)=0.0283 with my evaluation using 400´000 samples /size
n= 250, 0.0722
500, 0.0727
750, 0.0737
1000, 0.0742

One of us is deadly wrong, Abdi or me, for sure.
Do you agree?

Luis
Reply
- Charles
  
  January 29, 2017 at 11:54 am
  
  Luis,
  I agree, but it would be good to understand what are the right values to use.
  Charles
  Reply
Luis A. Afonso

January 1, 2017 at 1:58 pm

Hi, Charles

Monte Carlo Method (MCM) and me

Since more than 4 decades that personal computers (PC) are currently available to everyone, I started to be concerned in its usefulness in Statistical matters. Because they aimed to control depending on the chance procedures and also to educationally interiorize randomness, MCM is of undisputable value in practical and learning issues. Here is an example.
We know from Theory that the extreme values, a matter of importance in several branches as Climate Changes, for example, are, for the Uniform (0,1) Distribution:

P(min≤x)=1-(1-x)^n
P(max>x)=1-x^n

Follows the results for samples sizes 20,30, 50, 1 million samples per size:

___n=20__/1´000´000
___min.___max_________observed__ (theory)_
__0.005__0.995_____0.0953__0.0952 (.0954)_
__0.050__0.950_____0.6416__0.6412 (.6415)_
__0.100__0.900_____0.8788__0.8786 (.8784)_
__0.200__0.800_____0.9884__0.9883 (.9885)_
__0.300__0.700_____0.9992__0.9992 (.9992)_

___n=30__/1´000´000
___min.___max_________observed__(theory)_
__0.001__0.999_____0.0296__0.0296 (.0296)_
__0.005__0.995_____0.1397__0.1397 (.1396)_
__0.050__0.950_____0.7854__0.7854 (.7854)_
__0.100__0.900_____0.9574__0.9578 (.9576)_
__0.200__0.800_____0.9987__0.9988 (.9988)_

___n=50__/1´000´000
___min.___max_________observed__(theory)_
__0.001__0.999_____0.0488__0.0488 (0.0488)_
__0.005__0.995_____0.2217__0.2217 (0.2217)_
__0.050__0.950_____0.9231__0.9229 (0.9231)_
__0.100__0.900_____0.9948__0.9950 (0.9948)_
__0.200__0.800_____1.0000__1.0000 (1.0000)_

Therefore, we reach a two-fold conclusion:
The Theory is correct as everybody knows,
And my RNG is quite acceptable quality. which was of course the true intent.

Luis
Reply
Luis A. Afonso

December 21, 2016 at 3:53 pm

Hi Charles
I did obtain the Quantiles of the Geary Test (goodness of fit) concerning
Normal Data.
Is it possible to display the following Table in order to have feedback from
the Readers, to confirm that the values are acceptable for practical purposes?
1E7 simulated samples for each sample size.
Thank you.

0.80 0.85 0.90 0.95 0.98 0.99
n

20 .847 .855 .864 .878 .892 .901
25 .841 .848 .857 .869 .883 .891
30 .837 .813 .852 .863 .876 .884
35 .834 .840 .847 .858 .870 .878
40 .831 .837 .844 .854 .866 .873
45 .829 .834 .841 .851 .862 .869
50 .827 .832 .039 .848 .859 .865
55 .826 .831 .837 .846 .856 .862
60 .824 .829 .835 .844 .853 .860
65 .823 .828 .834 .842 .851 .857
70 .822 .827 .832 .840 .849 .855
75 .821 .826 .831 .839 .848 .853
80 .820 .825 .830 .838 .846 .852
90 .819 .823 .828 .835 .843 .849
100 .818 .822 .826 .833 .841 .846
120 .816 .819 .824 .830 .837 .842
140 .814 .818 .822 .828 .834 .839
160 .813 .816 .820 .826 .832 .836
180 .812 .815 .819 .824 .830 .834
200 .812 .814 .818 .823 .828 .832
Reply
- Charles
  
  December 21, 2016 at 9:32 pm
  
  Luis,
  Ok, it is posted. Let’s see what feedback you get.
  Charles
  Reply
Arnaud

December 19, 2016 at 6:03 pm

Hi Charles,

First of all, I would like to thank you for this wonderful website and codes. I started learning econometrics and your help if invaluable.

I have several case studies in the form of panel data. I was wondering if we can apply your logistic regression function to panel data?

Thanks for your reply.

Arnaud
Reply
- Charles
  
  January 2, 2017 at 12:01 pm
  
  Arnaud,
  I am very pleased that you value the website and software.
  I have not yet delved into panel data on the website and software. I hope to get to that in the future. The logistic regression capabilities described on the website and implemented in the software are only the beginnings of the approaches used with panel data.
  Stay tuned, more to come.
  Charles
  Reply
Jude

December 7, 2016 at 5:57 am

Goodmorning Prof,
Am Jude,i want to do my thesis on generalized approach to the Analysis of Variance.Please sir,can you direct me on how to commence going about it?Thanks
Reply
- Charles
  
  December 7, 2016 at 5:21 pm
  
  Jude,
  That is quite a broad request. For starters, please see the following webpage.
  Analysis of Variance
  Charles
  Reply
Lu

December 2, 2016 at 3:43 pm

Hello,
Would you please help me figure out the appropriate test. I have results from satisfaction surveys for 2 different companies. 40 surveys per company. I want to find out if the satisfaction is the same or different. Would I run a two sample t test with equal variances? Thanks for any guidance you can give me.
Reply
- Charles
  
  December 3, 2016 at 9:03 am
  
  Lu,
  You can run a two sample t test. If the variances of the two samples are relatively similar you can use the equal variances version of the test; otherwise you should use the unequal variances version of the test. You should also make sure that the assumptions for the t test are met; otherwise you might use a nonparametric test such as Mann-Whitney. See t Test for details.
  Charles
  Reply
Amy

November 12, 2016 at 9:57 pm

Dr. Zaiontz,

Thank you. Your website is wonderful, as well as the Excel add-in. I believe I have correctly run a logistic regression on my relatively small data set (n ~140). One input, one output, pretest ~ course success (pass/fail).

My question is: Can you recommend how I can find someone to double check my interpretation of the output? I did read your website and posted a question and if I interpreted your data set correctly, then I think I am doing my interpretation correct.

Are there statisticians who will check my work affordably?
Reply
- Charles
  
  November 13, 2016 at 11:13 am
  
  Amy,
  I don-t have anyone to recommend. Perhaps someone reading your comment will have someone to recommend.
  In the meantime, I will respond to your other comment shortly.
  Charles
  Reply
  - Amy
    
    November 17, 2016 at 3:11 am
    
    Thanks. I await your response. I think I got it, but just want to make sure.
    Reply
Mareike

November 9, 2016 at 6:58 pm

I would like to use the program, but I have the 64 bit version. Is there a way to use the program anyway?
Reply
- Charles
  
  November 10, 2016 at 8:55 am
  
  Mareike,
  I have heard mixed things about whether the software works with the 64 bit version of Excel. I would try to use it and see whether it works or not. Please let me know what you find.
  Charles
  Reply
  - Mareike
    
    November 10, 2016 at 10:07 am
    
    It opens only the file realstats.xlam, but no installation of the software. So I think it won`t work…
    Reply
    - Mareike
      
      November 11, 2016 at 4:09 pm
      
      My fault, it works! And it`s a really great tool. Thank you!
      Reply
James

November 7, 2016 at 12:09 am

Dear Dr. Zaiontz,

I just wanted to say thank you for sharing so much of your knowledge on your website. I am a math/stat major, and I have repeatedly returned to your website to get fuller, more intuitive explanations of the concepts in my stat textbooks, which usually throw formula’s at you without much explanation of their derivation.

A thousand thanks, Dr. Zaiontz!

Best regards,
James
Reply
- Charles
  
  November 7, 2016 at 9:32 am
  
  James,
  Thanks for your comment. I am very pleased that I was able to help you.
  Charles
  Reply
Nura Idris Abdullahi

October 29, 2016 at 11:14 pm

Dear Charles,
I am carrying out a research and want to analyse the effect of some meteorological parameters (temperature, relative humidity, evaporation, wind speed..) on river flow.
Can you kindly assist me and suggest the best statistical method for such an analysis.

Thank you
Reply
- Charles
  
  October 30, 2016 at 8:03 am
  
  You need to provide more detail about the type of analysis that you want to do.
  Charles
  Reply
Dan B

October 5, 2016 at 7:21 am

Hi again Charles,

Speaking of eVECTORS: the output of this array fn is a range of values, the first one being the eigenvalue and the rest being the coordinates of the (unit) eigenvector. Each time I use this function, I notice that the eigenvectors are listed in descending order with respect to the value of their corresponding eigenvalues. Is that a built-in feature of eVECTORS, such that I can trust that the output always will be organised that way (ie the highest eigenvalue first)? If so, that would simplify my calculations considerably.
Reply
- Charles
  
  October 5, 2016 at 9:34 am
  
  Hi Dan,
  
  The third parameter of the eVECTORS function describes the order in which the eigenvalues are listed.
  
  If order is TRUE or omitted then the eigenvalues are listed in order from highest in absolute value to smallest. If order is FALSE then they are listed in order from highest to lowest.
  
  Charles
  Reply
  - Dan B
    
    October 5, 2016 at 12:47 pm
    
    Thanks Charles. That is great, since I won’t have to sort or rank the eigenvalues/vectors in a separate formula.
    Reply
- Catherine B
  
  October 20, 2016 at 3:38 pm
  
  Hi Charles;
  
  I’m getting an error message in trying to send you an email through your published addresses. I’m in need of guidance. I’m hoping you can help. I have a sample of passengers boarding a vehicle and how far they traveled. The samples were collected every 6th day for a year. I need to be able to prove (disprove) that these samples meet a minimum of 95% confidence; and minimum precision level of +/- 10%. How would I go about proving this.? There are so many tests, I get confused with them. Thank you for your help!
  Reply
  - Charles
    
    October 21, 2016 at 4:18 pm
    
    Catherine,
    I am sorry, but I am sure what you mean by “samples meet a minimum of 95% confidence” or have a “minimum precision level of +/- 10%”.
    Perhaps you mean that the population mean is the calculated sample mean with 95% confidence (?)
    Charles
    Reply
    - Catherine B
      
      October 24, 2016 at 5:11 pm
      
      Can you reply to my email address and I can give you more detail?
      Reply
      - Charles
        
        October 24, 2016 at 7:30 pm
        
        Yes, I can try.
        Charles
      - Catherine B
        
        October 25, 2016 at 3:48 pm
        
        Going to send you an email from my personal account since our company email is migrating to the new microsoft 360 and experiencing many issues.
Luis A. Afonso

October 5, 2016 at 1:26 am

Bringing to light towards dismantling it, and it is our duty to do so…

Douglas H. Johnson
The Insignificance of Statistical Significance Testing (1999)
http://digitalcommons.unl.edu/usgsnpwTC/225

1. Are Null Hypothesis Really True?

“Most null hypotheses tested, however, state that some parameter equals zero, or some set of parameters are all equal. These hypotheses, called point null hypotheses, are almost invariably known to be false before any data are collected (Berkson 1938, Savage 1957, Johnson 1995)” [citation ended]

My comment
The parameter could very well be equal to zero and almost every time the observed value issued from data is different from zero: it is randomness acting. The A. is wrong, in fact we are completely unable by observation/experiment to state the true value of a parameter whatever. Using calculated Confidence Intervals for increasing sample sizes we obtain, progressively narrow intervals till a sufficiently low imprecision for practical intent is reached, in case of data homogeneity, and the test is strictly exact. On contrary with approximate test statistics it could very well happen that this final interval does not contain the true parameter value

Note: I do not care at all though the Null is true or untrue. My intent is to choose judiciously between it and the alternative. From the test value I can discard H0 with sufficient confidence or alternatively fail to do so when p-value is larger than alpha, a value I chose to wrongly reject a really true parameter value.

Luis A. Afonso
Reply
Dan B

October 3, 2016 at 4:05 pm

Hi Charles,

Thank you for these excel-lent (pun half-intended) stats tools. They work fine for me under MacOS 10 Yosemite apart from one major snag: execution of the functions take a very long time! Like half a minute or so. I seem to remember having similar problems with my own VBA macros in Excel until I put these code lines in at the start of every function:

Application.EnableEvents = False
Application.Calculation = xlCalculationManual

and then ending with reversing them:

Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True

Is there any way I can make the execution work faster without putting these lines into all the macros of RealStats? I guess there is something with the prefs of my Excel that brings this problem about, and that others don’t experience it. I would be very grateful for your comments. (btw, the RealStats VBA code i password-protected – do you distribute the password?
Reply
- Charles
  
  October 3, 2016 at 9:04 pm
  
  Dan B,
  You won’t be able to make these changes yourself. I don’t distribute the password.
  MY experience in the past was that the execution was relatively fast. Which function was this slow?
  Charles
  Reply
  - Dan B
    
    October 3, 2016 at 9:56 pm
    
    Hi Charles,
    
    Thanks for the prompt reply. Since I posted my question, I have experimented with Excel preferences, and it works like this: if “Calculation” prefs are set to “Automatic”, then execution of the RealStats function “EVECTORS()” takes like 30 s. However, if I change the Excel Calculation prefs to “Manual”, I have to prompt the execution of all calculations using “Calc Now” or “Calc Sheet”, but the actual calculation is quite fast. I don’t know why this happens specifically to me (and presumably not others, since I imagine that most people have their prefs set to automatic calculation).
    Reply
    - Charles
      
      October 4, 2016 at 8:37 am
      
      Dan,
      Strange indeed. Since eVECTORS uses an iterative technique it could take some time to calculate for a large matrix, but in my experience to date, it is still pretty fast.
      Charles
      Reply
      - Dan B
        
        October 4, 2016 at 10:12 am
        
        I don’t think it is specific to evectors, or even to RealStats. As I said, my own VBA macros behaved the same way, but there I could remedy it by putting in the code lines I mentioned earlier. So it’s probably not really a RealStats issue. I rather hoped you would know the solution. I guess I will have to search for the answer somewhere else or make do with manual calc.
Mel Hayman

September 28, 2016 at 2:02 am

Good morning Charles,
I am hoping that you might be able to help me out. I have a mac and successfully manged to download the add-in and get it to run. However, when I try to run my analysis, irrespective of whether it is a t-test of a repeated-measures ANOVA, I keep getting the same error message appear “compile error in hidden module: Analysis”. I have checked all of the fields and believe I have entered and selected everything correctly, but can not seem to get rid of this error. Any assistance would be very much appreciated!
Reply
- Charles
  
  September 29, 2016 at 10:50 am
  
  Mel,
  Are you able to use any of the functions? E.g. what do you see when you enter the formula VER() in any cell?
  Charles
  Reply
  - Janine
    
    December 21, 2017 at 10:18 am
    
    Hi Charles,
    I get the same message too and I’m using a Mac as well. The version I’m using is excel 2011 and version 5.3.2 software.
    Any assistance pls? Thanks!
    Reply
    - Charles
      
      December 21, 2017 at 5:03 pm
      
      Janine,
      Which message are you referring to?
      Charles
      Reply
rowena nakamura

September 22, 2016 at 9:28 am

Informative article
Reply
- Luis A. Afonso
  
  September 28, 2016 at 10:03 pm
  
  I do not care, at all, if the Null Hypothesis is true or not: the NHST
  measures in what measure H0 is unlike. If so p<alpha we reject it.
  if not we fail to reject. Never say that you accept H0. In effect, we are
  completely unable to state such a thing.
  Reply
Luis A. Afonso

September 3, 2016 at 10:28 pm

Gliner, Leech, Morgan [1] says as the major problem with NHST:

“ Kirk (1996) [2] went on to explain that was a trivial exercise because the Null Hypothesis is always false, and reject it is merely a matter of having enough power ”.

Our comment
The stated null hypothesis concerning a parameter, H0: p=p0 against Ha: p=p1 where p0 and p1 are real numbers, , even that the p0 exact value be practically impossible to be attained: I let out a laud laugh. When the real matter is to reject or fail to reject the null, I do not care at all this imprecision because with sufficient power (sample size) it always possible, given , to decide between the two hypothesis, so performing a rational choice. In fact the Confidence Interval (CI) once found is decisive: the test statistics inside, fail to reject H0, outside reject it. We know that a Type I error can occur, i.e. H0 is true, however we reject, or Type II, we fail to reject when H0 is untrue.

Luis

[1] – Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? Jeffrey A. Gliner, Nancy L. Leech, George A. Morgan. The Journal of Experimental Education, 2002, 7(1), 83-92.
[2] – Practical significance: A concept whose time has come: Kirk R.E. Educational and Psychological Measurement. 56, 746-759.
Reply
rahul hu

September 2, 2016 at 3:28 pm

if there is very large variable like 35 how to decide which variable is important
which variable to include in model?
Reply
- Charles
  
  September 2, 2016 at 6:20 pm
  
  Rahul,
  Are you speaking about regression?
  Charles
  Reply
  - rahul hu
    
    September 3, 2016 at 2:48 pm
    
    yes
    Reply
Luis A. Afonso

September 1, 2016 at 8:00 pm

Charles

I am fully acquainted, you know, a set of people at war with NHST. However, items like the one I point out, borns the ridiculous.
Such as [1]
“ Because the proposed Null Hypothesis H0: p=p0, is practically impossible to be attained, therefore the Null Hypothesis is always false”. “If there is no expectation regarding the possible truth of Null Hypothesis its falsification by data is a redundant nonsense “. “If the probability of a point hypothesis is indeterminate, the empirical discovery that discovery that such a hypothesis is false is no discovery at all, and thus adds noting to what is already known”.

No doubt a pearl of great price . . .

The “author” makes confusion between“point estimation” and the practical impossibility to have certainty at the real probabilistic world.

No matter . . . I guess that the fathers of NHST will be terrified to have found such argumentation.

I am just starting not worry about . . .

[1]- Null Hypothesis Significance Testing (On the Survival of a Flowed Method), Joachim Krueger (January 2001, American Psychologist, Vol. 56, No.16-26).

Luis
Reply
Shiggy K.

September 1, 2016 at 6:28 pm

Thank you for helpful to our study, Charles.

When I used QCRIT(x,y,z) in the tools, a compile error on lookup was occurred.
On the Excel 2010 version 14.0.7166.5000 (old one of my PCs), it was no problem, but on the version 14.0.7166.5000 (the latest of my PCs, it’s used for my study), it caused a compatibility problem.
How can I fix the error on latest Excel?
Or, if you can, could you construct the compatible latest edition?

Thank you for reading.
Reply
- Charles
  
  September 1, 2016 at 9:29 pm
  
  Shiggy,
  The two Excel version numbers that you provided are identical.
  Which release of Real Statistics are you using?
  What are the values of x, y and z?
  Charles
  Reply
  - Shiggy K.
    
    September 2, 2016 at 5:33 am
    
    Thank you for your reply, Charles
    Sorry, Excel version numbers are 14.0.7172.5000 and 14.7166.5000.
    I use Real Statistics that downloaded on Jun. 2016 for Excel version 14.0.7172.5000, and for Excel version 14.0.7166.5000, I use old one that downloaded on Nov. 2014.
    The values are x:6, y:12, z:0.05 for example, I can get the same error under the Real Statistics that downloaded on Jun. 2016 and Excel version 14.0.7172.5000.
    Shiggy
    Reply
    - Charles
      
      September 2, 2016 at 8:06 am
      
      Shiggy,
      I don’t know why you are having this problem. I used the formula =QCRIT(6,12,0.05) in the latest release (Rel 4.9 of July 19) and get the answer 4.75. What do you see when you use the formula =VER()
      Charles
      Reply
      - Shiggy K.
        
        September 2, 2016 at 5:16 pm
        
        Charles
        I try the old version add-in (3.2.2) that I used for past study to Excel version 14.0.7172.5000.
        In this case, I can get a correct value 4.75 by the formula =QCRIT(6,12,0.05).
        However, about the latest version (4.9), it seems to fail to read in Excel.
        So, for this time, I will use the version 3.2.2.
        Because this problem it may be proper problem due to the software constitution of my PC, I will ask to engineer of the our university about this problem.
        Thank you so much, Charles.
        Shiggy
Luis A. Afonso

September 1, 2016 at 8:14 am

Charles

Would you so kind to send a comment, please?
(This is only because some Psychologists as J. Cohen and similar fauna, does insist in swap NHST from literature, which, IMO, is too radical . . .)

Thank you, so much

Luis

A reasonable meaning of H0: p=p0

a) In a real, probabilistic world, a no matter parameter when estimate through experimental/observational data, is by proper nature, errors caring, even the estimator is no biased,

b) Remembering, if so, the aim is to get sufficient evidence to falsify the Null Hypothesis,

c) Read, algebraically as an equation, H0: p=p0, leads directly to the absurd that there is no room for a difference even for a last decimal unit, and consequently the null hypothesis have not chance to be true.

Because what is intended to find out a significant result, i.e., if p is sufficiently far from p0 that, given the observed data, we can state that the opposite complementary hypothesis, Ha: p p0 is true. In short, I think that H0: p=p0 must be understood as indicating an approximate equality, not statistically discernible, given the data. The same thing for two parameter values H0: p1=p2.
Reply
- Charles
  
  September 1, 2016 at 11:57 am
  
  Luis,
  A lot of people have criticized the null hypothesis approach to statistical analysis, and with good reason. Other approaches have been proposed and are used (e.g. Bayesian approach), but the null hypothesis approach is still the one that is most commonly used and so I have adopted this approach for the website.
  Charles
  Reply
Chaminda

August 30, 2016 at 8:35 pm

Hello,

I am analyzing wind speed data. I want to find weibull distribution of wind data. I have hourly averaged wind speed data for one year. I draw a graph for (Speed) vs ( Percentage of time). How can I find Alpha and Beta values of weibull distribution function. Thank You,
Reply
- Charles
  
  August 31, 2016 at 9:47 am
  
  Chaminda,
  If you know that the data follows a Weibull distribution, you can find the alpha and beta values that produces the minimum squared error (MSE) using Solver. This approach is illustrated in a number of places on the website. Using Solver for Logistic Regression, Using Solver for Exponential Regression and using Solver for Time Series Analysis.
  Charles
  Reply
rahul gupta

August 28, 2016 at 9:11 am

can we do knn in real stats software
Reply
- Charles
  
  August 28, 2016 at 9:54 am
  
  What is knn?
  Charles
  Reply
  - rahul
    
    August 31, 2016 at 8:27 am
    
    k – Nearest Neighbors for missing value
    Reply
    - Charles
      
      August 31, 2016 at 9:44 am
      
      Rahul,
      The Real Statistics software doesn’t yet support the knn algorithm. It does support the k-means clustering algorithm and Jenks Natural Breaks.
      Charles
      Reply
      - RAHUL
        
        September 1, 2016 at 6:26 am
        
        so how to do missing value imputation ?
      - Charles
        
        September 1, 2016 at 11:54 am
        
        Rahul,
        Various missing data approaches are described on the webpage
        Handling Missing Data
        Charles
Michael Moore

August 3, 2016 at 2:56 pm

Hi, Charles.

Thanks again for this great tool.

Recently, when I open a new spreadsheet (also older files I’ve created using this Addin), the resource isn’t available because the macros have been disabled.

Is this something I did or an update I’m missing? I never use to have this problem. Do you have a fix? Like I said, I have older files that make use of the resource in a template that I hope to automate.

Thanks so much for your help.

Michael
Reply
- Charles
  
  August 3, 2016 at 3:12 pm
  
  Michael,
  I am quite pleased that you like the tool.
  I don’t know any reason why the macros would be disabled (assuming that you didn’t disable them).
  If you have updated the Real Statistics software or moved the location of the software, then perhaps you need to tell an older file where to locate the add-in. You can do this exactly as described for the Real Statistics examples workbooks on the webpage:
  https://real-statistics.com/free-download/examples-installation/
  Charles
  Reply
Anand Deshpande

July 13, 2016 at 10:01 pm

I liked the site for the clarity, rigor and explanation of the process.
This is an excellent site for learning.

I wish there were examples showing potential pitfalls of using an “apparently obvious” method and/or hints in each chapter indicating, “What to Watch For”.
Reply
- Charles
  
  July 14, 2016 at 6:52 am
  
  Anand,
  That is a good idea. I will consider adding this some time in the future.
  Charles
  Reply
Georgios

June 8, 2016 at 12:20 pm

Dear Charles,
I’d like to convey many thanks, once again, and many times more, if need be, for all your kind help.

Unfortunately, following brief use of Real stats in Win 8.1, French, Excel 2007, Data section would no longer open, and would insted CRASH Excel….

I can no longer install the add-in…. (be it 2003 or 2007 or some other one), in EITHER of the two files mentionned in other parts of this blog.

Ἔῤῥωσθε! == keep well, in good health (take care, etc…)
Georgios
Reply
Ilya Tishchenko

June 6, 2016 at 2:42 pm

Mr. Zaiontz,
I would appreciate in case you’ll find time to answer my question. I’m analyzing a big number of point-sets(I would call them attributes). Each of these attribute is in XYZ format. I’m doing it to measure the evaluation of the data therefore I want to analyse it through the flow where the data are changed. So I’m generating these attributes (XYZ, 90 attributes each time) for several stages during the changing of data.

For each attribute I have statistics summarized with the following parameters: minimum, maximum, median, mean, standard deviation, skewnes, kurtosis. I would like to create a pdf function in excel based on these parameters as an input. So although I have the data points itself I don’t want to load them into excel (as it would be too many data) I want to use these extracted statistics to plot PDFs. Do you think it possible? Could you advise the workflow for this?
Reply
- Charles
  
  June 6, 2016 at 5:31 pm
  
  Ilya,
  Sorry, but I don’t know how you would be able to create a pdf based on the minimum, maximum, median, mean, standard deviation, skewness and kurtosis. This would give you the first four moments of the distribution, but not the moments after that. I would imagine you could make an estimate of a possible pdf, but I’m not sure how you would do that. Perhaps some other reader has an idea for how to do this.
  Charles
  Reply
  - Ilya Tishchenko
    
    June 7, 2016 at 10:04 am
    
    Ok, thanks for the reply) so I would have to use my datapoints I have but not a statistics from it. Maybe Kernel Density Estimation would be a solution in my case but anyway I would have to use my points as input data.
    Reply
    - Nick
      
      July 28, 2016 at 7:48 pm
      
      This could be done using Pearson distributions. See https://en.wikipedia.org/wiki/Pearson_distribution for more examples, and see the R package PearsonDS (https://cran.r-project.org/web/packages/PearsonDS/) for an example of how this has been implemented in R. It should in theory be possible for this to be re-implemented within the Real Statistics package.
      Reply
lisa

June 1, 2016 at 3:19 pm

HI Charles, loving your website by the way.
I’m working on some time series data and using the TIme Series Testing tool; my question regards the output from the ADF test. The answer is most probably written somewhere, but I can’t find it…. under stationary (yes/no), aic, bic, there is a lags box. My data at diff 1 (with trend) comes back as stationary with lags 6, does this mean the data is stationary after 6 lags? I.e. Do I need to difference the original data 6 times before I can use it for other tests and comparisons?

Thanks in advance, I think I’m having a dumb day!
Reply
- Charles
  
  June 1, 2016 at 4:50 pm
  
  Lisa,
  Please look at the following webpage and see whether this helps.
  Augmented Dickey-Fuller Test.
  Charles
  Reply
Alvin Tarrell

June 1, 2016 at 1:43 pm

Charles – great product that I use more and more – thanks so much for making and maintaining such a great product! I recently upgraded from Excel 2013 to Excel 2016, and now am having trouble making ‘Add-Ins’ appear on the ribbon. (Real Statistics is installed as an add-in and I can access it via Ctrl-m)

It’s not a big deal (since I can always use Ctrl-m), but I was wondering if this is just an issue with Excel 2016 or if I am not doing something right.

Thanks again,

Alvin
Reply
- Charles
  
  June 1, 2016 at 4:48 pm
  
  Alvin,
  Sometimes the problem is the order in which the various addins need to be installed. I think the Analysis ToolPak needs to be installed first and then Solver and then Real Statistics. This may indeed be irrelevant to the problem you are having, but sometimes Excel is sensitive to these sorts of things.
  Charles
  Reply
Ara

May 31, 2016 at 5:49 pm

I was looking for a non-parametric test equivalent to two-way ANOVA and thanks to your website I found how to conduct the Scheirer-Ray-Hare test.
So first, thanks a lot!!!
Second, I couldn’t find much information about this test, but I did find a comment saying it was not very robust… but which would be the alternatives, GLMs? I bet there are many cases out there where data with two factors do not follow normality, so I’m quite surprise not to find a clear pathway.
And third, if this test is perfectly valid, could I use any post-hoc, to check where are the differences within factors (e.g. my two factors are location and month, and the interaction is significant so I want to see in which months that happens). Any suggestions?
Thanks a lot in advance!
Reply
- Ara
  
  May 31, 2016 at 5:56 pm
  
  Just an additional question I had doubts of… If we have missing data, I see it didn’t work leaving the cell blank, or with NA, it only works introducing “0”. Is it considering it as a value?
  Thanks again
  Reply
  - Charles
    
    June 1, 2016 at 8:09 am
    
    Ara,
    I don’t recommend using 0 for missing data. I suggest that you look at Handling Missing Data.
    Charles
    Reply
- Charles
  
  June 1, 2016 at 8:07 am
  
  Ara,
  It is true that Scheirer-Ray-Hare test is not very robust. You can use the various post-hoc tests, provided their assumptions are met.
  Charles
  Reply
Jane

May 23, 2016 at 5:33 pm

Hi, I just downloaded this Add-in today and I tried to use the one sample t-test (as described from your website). However, every time I enter the data range and the correct settings, I get this response when I click OK: “”A run time error has occurred. The analysis tool will be aborted. Unable to set the HorizontalAlignment property of the Range class””.

What does this mean and how do I fix it so I can use the one sample t-test function?

Thanks!
Reply
- Charles
  
  May 24, 2016 at 9:18 am
  
  Jane,
  If you send me an Excel file with your data I will try to figure out what is going wrong. You can get my email address from the Contact Us webpage.
  Charles
  Reply
SB

May 4, 2016 at 6:54 pm

Dear Sir,

Currently I’m working on a schoolproject on forecasting the employee turnover of an organisation. To do this, I’m using a time series data of the employee turnover over the past 7 years. It is an annual data from 2009 to 2015. To make a good prediction it is also important to add some explanatory variable.

I want to use multiple regression with ARMA errors. But i’m not sure how to use these error to do some prediction. Do I need to make a prediction of each of the explanatory variable on it’s own first. Then add this predicted value in the model to do the prediction of the employee turnover?
Or will the model do alle the predictions of the dependent and independent variable and give me a predicted value of the dependent variable (turnover)?
Or do you recommend using another model to do the forecasting with all the explanatory variable?

I am really having a hard time putting all the variables in a model that gives me a good prediction. Also, the internal data has only a few observation, whereas the external data has more historical data? How can I combine these two data to do a multiple regression?

Is it statistically correct to use only 7 observations to do some forecasting?

Can u please help me out sir.

Thanks very much in advance
Reply
- Charles
  
  May 5, 2016 at 6:01 pm
  
  SB,
  At present I only support time series for one variable. See Time Series Analysis.
  I will eventually add multivariate time series capabilities.
  Regarding your last question, you could make a forecast based on limited data; of course, the confidence interval of the prediction will be larger.
  Charles
  Reply
blaise

April 28, 2016 at 11:00 am

hi doc,
I must say am inspired and helped in the write up on statistics. I may say am unfamiliar with statistics and am doing a research which involves statistics..yes I have used simple excel now am finding difficult in analyzing the p-value …and do a scientic analysis. I have sent u my results of the p valus in your email. thanks in anticipation
Reply
- Charles
  
  April 28, 2016 at 5:00 pm
  
  Blaise,
  I have responded to your email.
  Charles
  Reply
Brian Wright

April 27, 2016 at 3:55 pm

Hi Charles,

Appreciate your contributions to the statistics community.

I am trying to regress roughly 150 dependent variables (150 mutual fund’s time series returns) against roughly 8 independent variables (the time series of various factors like the return from simply buying cheap stocks, or stocks that have gone up recently, etc.). Thoughts on ways to do this without drinking heavily?
Reply
- Charles
  
  April 27, 2016 at 7:10 pm
  
  Drinking heavily seems like a pretty good option to me.
  Charles
  Reply
Sarah

April 22, 2016 at 9:23 pm

I don’t have the ability to interpolate on my Excel (Mac version) and have the following values. My W is 0.984 ,n = 30 and my p-values in between 0.9 (.983) and 0.95 (.985). I don’t have the interpolating feature on Excel and saw the equation you had on that page was a little confused as to what went where to calculate it correctly. Some help would be much appreciated 🙂
Thanks!
Reply
- Charles
  
  April 23, 2016 at 9:40 am
  
  Sarah,
  I don’t know which page you are referring to. Shapiro-Wilk? In any case, please look at the following webpage
  https://real-statistics.com/statistics-tables/interpolation/
  Charles
  Reply
  - Sarah
    
    April 23, 2016 at 8:12 pm
    
    I saw this page and that was my question in regards to the equation. Where you have .522 I wasn’t sure which of my variables went there in order to get that calculation to go.
    Thanks
    Reply
    - Charles
      
      April 23, 2016 at 10:38 pm
      
      Sarah, sorry but I don’t know which webpage you are referring to when you say .522.
      Charles
      Reply
      - Sarah
        
        April 24, 2016 at 3:44 am
        
        The equation thats on the interpolation page you posted the link for.
      - Charles
        
        April 24, 2016 at 8:52 am
        
        Sarah,
        Thanks. Is everything clear now?
        Charles
Nur

April 21, 2016 at 9:03 pm

Hi,
I am doing a research about GPA’s at my school. In my data sheet I have 50 samples, under each label I have less than 30 samples. I tried to use “t-test: Two-Sample Assuming Unequal Variances” because I don’t have the exact variances but I also know that I cannot use t-test if I have more than 30 samples. Since I don’t know the population variances I cannot use the z-test, too. Which test should I use in this case?
Thank you in advance.
Reply
- Charles
  
  April 22, 2016 at 11:45 am
  
  Nur,
  
  Are you saying that you have 50 samples and each sample has fewer than 30 elements?
  
  The following statement that you made “I also know that I cannot use t-test if I have more than 30 samples” is not true. You can use the t-test for samples with more than 30 elements as well as for samples with fewer than 30 elements. You also don’t need to know the variances to use this test. I suggest that you look at the following webpage, especially the subpages about hypothesis testing
  
  t Distribution
  
  Charles
  Reply
Grant

April 18, 2016 at 4:13 am

Hi, I wanna say thank you for the add-in functions you provided for us!
However, today when I’m using function nchisq_dist(9629.82651550155,0.06365952,9414.72191791204,TRUE), it returns 0 which seems incorrect. I input the same number in matlab to verify it while matlab returns 0.8659. Since I don’t have the password for that add-in, so could u plz help me out?
Reply
- Charles
  
  April 21, 2016 at 10:44 am
  
  Grant,
  Yes, there are limitations to the use of the NCHISQ_DIST function that I have provided. In Excel CHISQ.DIST generates an error value when df < 1. Since NCHISQ_DIST uses the Excel function CHISQ.DIST, it is not surprising that it does not return the correct value when df < 1. There are also limitations when x or lambda are large. I will try to explain these limitations on the website. Thanks for identifying this problem. Charles
  Reply
Isabel

April 14, 2016 at 12:47 am

Hello Charles!
I’m trying to compare two population proportions to see if there is a significant difference between the two proportions. I’m reading everywhere that I have to use the z-score to do so, but I can’t seem to find any good information on the quickest way to do this in excel. My goal would be to have a p-value in the end, telling me if the difference between the two population proportions is significant.
Could you help me with this? Is there a page on your site on which this is explained?
Thank you so much! Your website is very helpful!
Isabel
Reply
- Charles
  
  April 14, 2016 at 9:18 am
  
  Hello Isabel,
  Yes, please look at the following webpage.
  Proportion Distribution
  Charles
  Reply
Maria

April 9, 2016 at 9:17 am

Dear sir,
Let’s suppose that someone uses a questionnaire with questions like this “How much do you think that X can influnce your life?” and the participants have to answer by choosing a number from 1 to 10. Could median be a meaningful measure in a case like this? Or the researcher should just stick to the mode?

Thank you very much for your time.
Reply
- Charles
  
  April 9, 2016 at 9:49 am
  
  Maria,
  This all depends on the details of what you are trying to analyze. Usually the mean is used, sometimes the median, and only rarely is the mode used. But which you should use really depends on your specific research. It also depends on the test that you use. E.g. if you decide you want to analyze the data with the t test (assuming that this makes sense for your analysis and that the assumptions for the t test are met), then you probably want to focus on the mean. If it turns out that the assumptions for the t test are not met, then it would be common to use the median (and some nonparametric test) to do the analysis.
  Charles
  Reply
  - Maria
    
    April 9, 2016 at 1:33 pm
    
    Let’s say that the researcher wants to explore the ideas of some people about how environmental problems influence their own lives. So, he has a questionnaire with e.g. 10 questions. Each question has the same form (“How much do you think that X can influnce your life?”) and the same response options. These options range from 1 to 10, where 1 means “no influence” and 10 means “great influence”. So, 1-10 are not “real” numbers; theyare just the participants’ way to express what they think.
    
    In this case, when the researcher knows that e.g. 5 participants have ticked the option “1” and 10 ticked “2” …. etc., and does not plan to run any test at all, is there a real meaning for him to estimate the median? I mean, if he just wants to describe this kind of data, is median really applicable and meaningful?
    
    Thank you very much once more.
    Reply
  - DESMOND
    
    April 12, 2016 at 12:16 am
    
    Good day Dr
    
    First of all I would like to express my appreciation for the invaluable real statistic package which have help a lot of people already.
    
    However the subject of the matter is that I am carrying out a research on “The viability of crowdfunding to meet the funding needs of SMEs” and my main focus is on the supply side of funds, the investors. I am using a questionnaire to gather the data of the investors and the online questionnaire is on the following link https://docs.google.com/forms/d/1pibVVwzBqyrL4SQ9mcmNzCiArwzXp0VKWcy8J_vJe2Y/viewform
    
    After gathering the data and test on the viability, i decided to consider whether the potential investors would chose to invest or not(the propensity to invest). This would constitute my dependent variable. The independent variables include, investment capacity, which is a function of income and investment pledge, expected return surplus, payback period, there are other variables that i have considered such as internet connectivity, social media presence, access to a payment system as variable of consideration of crowd-funding etc
    
    I thought the logistic model was going to be most appropriate model to use considering the binary nature of most of my data and i chose it.
    
    however i have been following your video that you shared on YouTube on logistic regression model, after coding my data. I did run the analysis but i got errors in excel about the data format.
    
    May you please help in how i can address the error issue, where i should look at as well. I WOULD BE VERY, VERY HAPPY if you assist on the overall project by offering comments, insights and direction as well. I believe this project is of great importance to the SMEs in my home country and the nation as a whole and i would like it to be as informative as possible.
    
    Accompanying this mail is the Excel sheet with the responses and the data that i am trying to analyse.
    
    I am in much anticipation of your response
    Reply
    - Charles
      
      April 12, 2016 at 9:09 am
      
      Desmond,
      I have received your email and will send you my comments shortly.
      Charles
      Reply
      - desmond
        
        April 12, 2016 at 4:40 pm
        
        Ok Doc will be be waiting for your response
      - Charles
        
        April 15, 2016 at 3:21 pm
        
        I have now sent you a response to your email.
        Charles
Lis

April 1, 2016 at 3:14 am

Hi,

Do you know if its possible to do the exponential decay 3rd order in excel (y=yo+a*exp(-bx), fits for yo, a and b? I know I can do this in Origin or Sigmaplot, but it would make it easier if I could also integrate this on my excel sheet for the calculations I need.

Thanks!
Reply
- Charles
  
  April 1, 2016 at 8:09 am
  
  Lis,
  See the following webpage
  Exponential Regression
  Charles
  Reply
Rod

March 25, 2016 at 8:41 pm

Hi, Dr. Zaiontz!
Have you given any thought to doing more with the Fast Fourier Transform (FFT) already provided in Excel?
They really have not explained what to do with it after they generate the complex form. I have managed to compute the resulting magnitude (amplitude) and frequency from the complex form given but need help on how to interpret what I have…presumably the frequency spectrum. Ok, so I locate the dominant frequency after I graph the magnitude and frequency….now what? How do I go about combining the various frequencies to get a better fit and how do I come up with the parameters for the various cos and sin terms? Is there a short cut without having to construct it individually for each frequency?
Puzzled?
Reply
- Charles
  
  March 25, 2016 at 11:04 pm
  
  Sorry Rod, but I haven’t tried to work with the Fast Fourier Transform capabilities in Excel.
  Charles
  Reply
Mike

March 18, 2016 at 10:11 pm

I recently consulted with a statistician that informed me, I need to use Generalized Estimating Equations (GEE) on the data that I’ve been gathering.

Does this excel add-in allow for GEE calculations?
Reply
- Charles
  
  March 18, 2016 at 11:40 pm
  
  Sorry Mike, but the Real Statistics website and software doesn’t yet support GEE. This will be supported shortly.
  Charles
  Reply
Ralph Shelton

February 22, 2016 at 1:30 pm

Thanks for the free software, it is appreciated, from someone who enjoys statistics, and occasionally needs to do analysis
Reply
Tom

February 17, 2016 at 1:52 pm

Prof,
I have an dependent variable and four independent variables.
To do regression analysis using excel 2013, should I select all the FOUR independent variable together under Input X range, or I should do regression of the dependent variable against each independent variable, thus in all, four times?
Thanks
Tom
Reply
- Charles
  
  February 19, 2016 at 9:24 am
  
  Tom,
  It all depends on what you want to accomplish, but I suggest that you perform the regression with all four independent variables and see which ones are not significant. These are candidates for elimination from the model.
  Charles
  Reply
Craig

February 16, 2016 at 11:22 pm

I want to determine whether two sets of data (n = 12, for both) are statistically significantly different at P <=0.001 (two-tailed test). Should I run Mann-Whitney U test on the raw data, or should I first do a normality check and convert the data to a normal distribution? (say, by log, square-root, or exponential conversion).
Reply
- Charles
  
  February 19, 2016 at 9:28 am
  
  Craig,
  If the data is normally distributed then generally you should use the t test instead of Mann-Whitney. You should first check to see whether the data in each sample is normally distributed. If the departure from normality is not too severe then you can use the t test. Even relatively symmetric data should be ok. If the data is very skewed then you can use the Mann-Whitney test provided the distributions of the two samples are relatively similar (even if skewed).
  Charles
  Reply
  - Craig
    
    February 19, 2016 at 3:53 pm
    
    Very helpful — thanks! Follow-up question: If the data ARE skewed (i.e., not normally distributed) and I use Mann-Whitney, would I use it on the raw data, or on the data that are transformed by whichever transformation brings their distribution closest to normality?
    Reply
Sharon

February 11, 2016 at 11:07 am

Hello Charles,

I am having trouble finding a relatively recent academic reference to support the assertion that ‘MANOVA is not very sensitive to violations of multivariate normality provided there aren’t any (or at least many) outliers’. I have a large sample (n=955) with no outliers, but the assumption of multivariate normality is violated. I can proceed with the analyses if I can cite evidence that the MANOVA will remain robust. Could you recommend a citation?

Kind regards,
Sharon
Reply
- Charles
  
  February 11, 2016 at 8:16 pm
  
  Sharon,
  
  This is not to say that if the data is really skewed that you couldn’t have problems, but generally normality is not a problem. Some references are:
  
  http://homepages.inf.ed.ac.uk/bwebb/statistics/MANOVA2.pdf
  http://documents.software.dell.com/Statistics/Textbook/ANOVA-MANOVA#deviation
  http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt
  
  Charles
  
  Charles
  Reply
- Charles
  
  February 22, 2016 at 8:33 pm
  
  Sharon,
  
  Here is one such reference.
  
  http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt
  
  See slide 11.
  
  Charles
  Reply

1,124 thoughts on “Contact Us”

Leave a Comment Cancel reply