Author

Dr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

491 thoughts on “Author”

ebru

November 7, 2018 at 1:25 pm

Hi Dr. Charles Zaiontz,
This is the cronbach alpha’s reliability. You can see this table.Is there any reference to support this table?If you say yes, could you please send reference? or How can i decide this alpha is high reliability or not? References is too important for me.thank you .
Cronbach alpha’s Reliability
0.00≤̠α<0.40 Scale not reliable
0.40≤̠α<0.40 Scale low reliability
0.60≤̠α<0.80 Quite reliable
0.80≤̠α<1.00 High reliability
Regards,
Ebru
Reply
- Charles
  
  November 7, 2018 at 9:37 pm
  
  Ebru,
  There isn’t universal agreement about these ranges. If you look at the Wikipedia entry for Cronbach’s alpha you will see different ranges. I am sure that some books will have some version of these ranges, while others won’t include any ranges.
  Charles
  Reply
Abduh Almashy

November 5, 2018 at 12:31 pm

Dear Charles,

Hope you are well

I have only one urgent inquiry regarding normal distribution test.

Typically, p < 0.05 of the Shapiro-Wilk test indicates data are not normally distributed.

My data are normally distributed at .001, as my data is not normally distributed at .05.
So, I have chosen .001 since this is a more relaxed criterion to assess normality compared to .05.

Is this accepted? What do you think?
Is there any reference to support this?

Regards,
Abdul
Reply
- Charles
  
  November 5, 2018 at 5:24 pm
  
  Abdul,
  Most tests are pretty robust to departures from normality and so using .001 is probably good enough. The risk is that in your particular case, the data is truly not coming from a normally distributed population and any test that you are using is really invalid. This risk is probably pretty low, but it exists. I suggest that you take a graphic look at your data (histogram, QQ plot, etc.) to see whether it looks normally distributed.
  Charles
  Reply
Wouter

September 4, 2018 at 3:40 pm

Hello,

nothing to see but thank you for the website.

kind regards
Reply
Rahul

August 30, 2018 at 12:30 pm

Hello,
Is there any GLS implementation available in Microsoft excel
Thanks
Reply
- Charles
  
  August 30, 2018 at 1:17 pm
  
  Rahul,
  No, but there is GLS support in the Real Statistics software.
  Charles
  Reply
  - Farah
    
    October 10, 2018 at 11:08 am
    
    Dear,
    Can you help me in writing findings on my ARIMA model in population analysis?I am stuck here.I can mail you this analysis.
    P:S:I am not mathematician.
    Reply
    - Charles
      
      October 10, 2018 at 6:48 pm
      
      Farah,
      You can mail me at the email address found at Contact Us.
      Charles
      Reply
Anam Zahra

August 16, 2018 at 11:21 am

Hey Charles!
When I use ADFTEST function in excel, it only returns a single cell named as “tau stat” I am unable to get an 8*2 output. how should I do it ?
Reply
- Charles
  
  August 16, 2018 at 1:18 pm
  
  Anam,
  ADFTEST is an Excel array function. To get the full output you can’t simply press Enter. See the following webpage for how to handle array functions.
  Array Formulas and Functions
  Charles
  Reply
Vusani

August 3, 2018 at 9:07 am

Hi Charles
Which experimental design do I use if doing an experiment on 9 different soil components for two different plants. The height of the plant and color (green/yellow) will be measured
Vusani
Reply
- Charles
  
  August 3, 2018 at 9:34 am
  
  Vusani,
  What hypothesis (or hypotheses) do you want to test?
  Charles
  Reply
  - Vusani
    
    August 3, 2018 at 9:45 am
    
    I want to test the effect of the soil components on the two different vegetables
    Reply
    - Vusani
      
      August 3, 2018 at 9:47 am
      
      Colour of leaves is for the healthiness of the plant
      Reply
    - Charles
      
      August 6, 2018 at 6:34 pm
      
      Vusani,
      Since you have two dependent variables (colour and height) this looks to be a fit for MANOVA. The “devil is in the details”, and so the exact test depends on factors that you haven’t specified.
      Charles
      Reply
Nick

July 22, 2018 at 8:38 am

Hi Charles,

Firstly, thanks for your site and app, it’s so appreciated.

I have Excel 2016 desktop application. I would like to add Real Statistics to my Ribbon in an existing Tab with other marcos I have already in it. When I try to add it I can’t find the macro in the Marcos list to apply it.

Could you please let me know if this is possible?
Reply
- Charles
  
  July 22, 2018 at 2:23 pm
  
  Nick,
  I imagine that it is possible, but I haven’t figured out how to do this. Perhaps someone else knows how to do this.
  Charles
  Reply
  - Nick
    
    July 23, 2018 at 12:31 am
    
    Hi Charles,
    Thanks for prompt reply.
    If anyone has solution it’s apprecaited.
    Nick.
    Reply
Greg Venizelos

July 5, 2018 at 12:54 pm

Dear Charles.
Is it possible to extract the PCA components’ timeseries when using the Factor analysis of your XL add-in ?
Best regards.
Reply
- Charles
  
  July 5, 2018 at 7:02 pm
  
  Greg,
  I don’t know what you mean by “PCA components’ time series“.
  Charles
  Reply
sam

June 25, 2018 at 9:46 pm

Sir can you please explain the method of finding shape and scale parameters for wind data using Weibull distribution ?
Reply
- Charles
  
  June 25, 2018 at 11:24 pm
  
  Sam,
  See https://real-statistics.com/distribution-fitting/
  Charles
  Reply
David Shields

June 13, 2018 at 9:22 pm

Charles,

I am a layman, so please forgive the language. If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.

I know, for a fact, that there are regular, timed events in the market. In fact there are many of them. There are also timed news events that manipulate the market.

Wouldn’t this then “distort” the series away from a Normal Distribution to some other form. I have seen a white paper that describes the market as a Sinusoidal Distribution.

I guess the question is, can a Normal Distribution have a regular pattern in it? (Obviously this is an interesting subject to Traders)

Somewhat confused
David Shields
Reply
- Charles
  
  June 13, 2018 at 9:54 pm
  
  David,
  What is the source of “If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.”?
  If data completely follows a normal distribution then it would follow a regular pattern (described by the bell curve), but clearly real data would have some randomness to it.
  Charles
  Reply
kaushik hazrika

June 7, 2018 at 1:10 pm

4 treatment 3 replica each in two time factor. Can it be find in ANOVA in Excel 2007 with replication
Reply
- Charles
  
  June 7, 2018 at 6:52 pm
  
  I believe that you are saying that you have (1) one Treatment factor with 4 levels, (2) one repeated measures Time factor with two levels and (3) 3 replications. You won’t be able to address this with any of Excel 2007’s data analysis tools. You can use Real Statistics’ Mixed Repeated Measures ANOVA data analysis tool to perform this analysis in Excel 2007.
  Charles
  Reply
Rafael Ladany

June 6, 2018 at 1:36 am

The site is extremely helpful, thank you for your effort, sir!
Reply
Sareth Nhem

April 14, 2018 at 6:44 am

Hi Charles,
Thank you so much for great excel sheet for the Kendall’s W. I would need your technical help for Kendall’s W. I will use Kendal’s W for the data analysis for my PhD thesis. However, I have a problem with.
Let me brief you about the nature of my date as following:
1. There are 27 Judges to rank the 15 objects.
2. I used ordinal scale while I conducted the survey. Thus, I diagnose 1=strongly agree, 2=agree, 3=neither agree or disagree, 4=disagree and 5=strongly disagree.
3. The question research was about: That is level of agreement of rater to Cambodia’s legal framework articulate the community-managed forestry.
4. I tried to compute Kendall’s W about this issue to find which policies among 15 is the pleasantness to raters that they believe these policies highly and greatly articulate the important role and attributes of community-managed forestry into it – by doing so the sustainable forestry management will be existed in Cambodia. I would request to have your email, I thus can ask you for the technical issues. This is my email: nhemsareth@gmail.com
Best regards,
Sareth
Reply
- Charles
  
  April 19, 2018 at 2:25 pm
  
  Sareth,
  See Contact Us for my email address.
  Charles
  Reply
Khalid Alhoz

April 13, 2018 at 1:44 pm

Hi Charles,

I cannot thank you enough for the material that you put in this website. Recently, I had a statistics assignment for my master degree and I learned a lot by going through your lessons.

Thank you very much for providing such a fine work.

best regards,
Khalid
Reply
- Charles
  
  April 14, 2018 at 8:59 am
  
  Khalid,
  Glad to see that the website was helpful.
  Charles
  Reply
  - Khyria Wanyoike
    
    May 24, 2018 at 10:21 am
    
    I need some help with using r to plot Kaplan Meier with date using my fisheries data. Thank you
    Reply
    - Charles
      
      May 24, 2018 at 3:04 pm
      
      Khyria,
      I am not that familiar with R, and so can’t help you with that.
      This website focuses on using Excel, especially using the Real Statistics package, to perform statistical analysis.
      Charles
      Reply
Jack Lofland

February 27, 2018 at 4:03 pm

Great website. I’ve learned how to use Jenks Natural Breaks Optimization in order to cluster products by cost.

One question though, now I want to cluster products by cost and manufacturer. I’m struggling with a solution for this. Any suggestions?

Thanks!
Reply
- Charles
  
  February 28, 2018 at 3:37 pm
  
  Jack,
  How do you plan to assign a value to manufacturer? These can’t simply be categorical values since you need to measure distances between the values.
  Charles
  Reply
alvincent

February 21, 2018 at 11:07 am

Sir
I have questions,
i have read a journal about overlapping clustering, it uses kmeans algo.. It only uses maximum distance to identify overlap clusters generated by kmeans.
My question is, Is there a fix maximum distance allowed by K-means in assigning the data objects to a cluster? Or the maximum distance is the distance of object(measured distance of objects to its centroid) that were assigned in the cluster?
Thank you
Reply
- Charles
  
  February 21, 2018 at 8:50 pm
  
  If I remember correctly, the algorithm seeks to minimize the distance between any member in a cluster and its centroid, and to maximize the distance the centroid and points not in its cluster. I don’t believe there is a predefined maximum distance.
  Charles
  Reply
Gulab Kaliramna

February 18, 2018 at 4:13 pm

Sir,

1. I have applied Box Cox transformations as suggested by you but it didn’t worked. I am using 5 point likert scale in my research and majority of responses lies between 3 to 5. I used the value of λ as 1 as no clue was given in your literature. I was able to calculate the value of X & Y but the value of Z and r didn’t come when I have applied formula for Z as =NORM.S.INV((H$4-0.5)/H$203). My data started from cell H4 to H203.
2. What should the optimum sample size for using non-parametric tests. Do the 5 point likert test is less efficient than a likert test of say 10 or 11 points for normality check of the data.

Best regards.
Gulab Kaliramna
Reply
- Charles
  
  February 18, 2018 at 5:48 pm
  
  Gulab,
  1. I don’t have any further advice, except to try other values for lambda.
  2. 10 point Likert has better chances than 5 point Likert of satisfying normality. I haven’t come across many ways of calculating the minimum sample size for nonparametric tests. I know that when normality is satisfied then power of the Mann-Whitney test is about 94% of the power of the t test, everything else being equal. This means that the sample size required is only slightly higher than for the t test (when normality is satisfied).
  Charles
  Reply
Gulab Kaliramna

February 14, 2018 at 7:01 pm

Dear Dr. Charles Zaiontz,

1. I have collected data from 200 respondents on 5 point likert scale using Occupational role stress (ORS) by Udai Pareek but my data is not normal. I have used square root, inverse and double transformation method to normalize my data. Is there any other method to normalize the data.
2. I also want to know, is it compulsory to use parametric tests only to score good in research work.

Best regards,
Gulab Kaliramna
Reply
- Charles
  
  February 15, 2018 at 8:33 am
  
  Gulab,
  1. See Box Cox transformations at
  Box Cox Transformations
  2. If the test assumptions are not met for a parametric test, then it is perfectly ok to use a nonparametric test. The main downside is that the power of such a test will be lower, which may require a larger sample size.
  Charles
  Reply
Pablo Caballero

January 25, 2018 at 3:46 pm

Hello Dr. Zaiontz.
I’ll be quick, Your website is gold. Many thanks

Best Regards

Pablo Caballero
Reply
- Charles
  
  January 25, 2018 at 4:02 pm
  
  Thank you very much Pablo.
  Charles
  Reply
Jorge Matos, PhD.MSc.CivilEng.

January 8, 2018 at 11:18 pm

Dear Dr. Charles Zaiontz

This is just another thanks message. But I can not fail to publicly thank you for your work. I am sure that your life reflects the good that you have done to everyone with your work on this site. Have you visited Portugal yet? Please do so. We are people of good will, and we would like to welcome you.
Reply
- Charles
  
  January 9, 2018 at 11:26 am
  
  Jorge,
  Thank you very much for your kind words.
  I have visited Portugal before, but this was many years ago. I enjoyed my visit and the people very much.
  Charles
  Reply
Kuldeep Jain

December 21, 2017 at 12:21 pm

Hello Charles, i am following your website to get this valuable knowledge. the book you mentioned in the post is “Statistics using Excel @ Succinctly” or do you have some other book as well. i am inclined on your book because of excel use and we can do what we are reading. Do you have the sample excel files as well for your topics.

I have downloaded “Statistics using Excel @ Succinctly” and printing this.
Reply
- Charles
  
  December 21, 2017 at 5:00 pm
  
  Hello,
  No. “Statistics using Excel @ Succinctly” is not the book mentioned in the post (although book was written by me). The book I am referring to in the post will much more detailed and should be coming out early in 2018.
  Charles
  Reply
  - Kuldeep Jain
    
    December 21, 2017 at 5:27 pm
    
    We will be waiting…. I am not sure whether we would be able to have it / buy it in india soon. Please plan for the PDF format as well. Best wishes for the book.
    Reply
    - Charles
      
      December 21, 2017 at 5:31 pm
      
      There will be a pdf format that you can buy in India.
      I have already finished writing the book and was hoping to publish the book this year, but I have struggled to find the time to finish proof-reading it and making any necessary revisions. In any case, it should be available in the first part of 2018.
      Charles
      Reply
Isabella

December 4, 2017 at 10:36 pm

Dear Dr. Zaintoz, do you also offer private consulting? I might need some explanation on how to run some tests…ICC and kappa and/or Fleiss
Please let me know.

PS. Non riesco a trovare su internet alcun video che spieghi come calcolare kappa oppure Fleiss con piu’ observers. Sono un po’ disperata. Sto provando ad usare Excel e spero ardentemente di non dover imparare SPSS.
Reply
- Charles
  
  December 5, 2017 at 8:37 am
  
  Isabella,
  I do offer private consulting. Please send me an email if you would like to pursue this.
  Charles
  Reply
Tiago

November 13, 2017 at 1:03 am

Hi Charles!

First of all, thank you for the amazing website!

I want to test whether a noise signal is white or not, that is, I want to verify that the correlation between samples is null. I have the signal but don’t know what test to perform. What do you recommend?

Tiago Silva
Reply
- Charles
  
  November 13, 2017 at 4:56 pm
  
  Hi Tiago,
  See the following webpage regarding how to test the correlation:
  One sample hypothesis testing for correlation
  Charles
  Reply
  - Ahmed
    
    November 18, 2017 at 4:03 am
    
    Prof: Charles
    Kindly I need
    Bayes with t-test, ANOVA
    Reply
    - Charles
      
      November 18, 2017 at 9:21 am
      
      Ahmed,
      I don’t know what you mean by Bayes with t-test, ANOVA.
      Charles
      Reply
  - Noor fatima
    
    November 26, 2017 at 4:10 pm
    
    Prof. Charles
    Plz check your email
    Em badly stuck in my homework
    Reply
David

October 26, 2017 at 3:56 am

Hi,
I just want to thank you for this outstanding website and information you have put time on. I am great full for the work you have put in this website which helped me a lot with my assignment, and i am sure it helped thousands others around the world as well.

David from Australia
Reply
- Charles
  
  October 26, 2017 at 9:46 am
  
  Thanks for your kind remarks, David. I am very gratified when I see that people are getting value from the website and software. I hope to continue to expand the topics covered and to improve the learning experience.
  Charles
  Reply
Terman Frometa-Castillo

October 8, 2017 at 11:06 pm

The binomial distribution (BD) was created for determining P(k;n,p), the probability of k successes in a number n of independent trials with a constant probability p of success. . Given the (𝑛¦𝑘) BD term does not have practical importance. For example if you do four trails, it will be indifferent to obtain one success in the first, second, third or fourth. And even the products of probabilities are questionable.

The statistical models project (SMp) proposes the following expression: 𝑃(𝑘;𝑛,𝑝)=(𝑘∗𝑝+(𝑛−𝑘)∗(1−𝑝))/𝑛

which represents a weighted for p and (1-p), i.e. respectively probability of success or failure in each trial.

Is the SMp P(k;n,p) expression more appropriate than the current Binomial one?
Reply
- Charles
  
  October 9, 2017 at 8:19 am
  
  Terman,
  I have always thought that one of the advantages of the binomial distribution is that it is indifferent to which trials had the successes and which had the failures.
  Can you give me an example where the order of the successes and failures would be important?
  I don’t fully understand the meaning of the SMp expression for 𝑃(𝑘;𝑛,𝑝). Is this really supposed to be the probability of success on each trial as you have stated? I thought this was p.
  Charles
  Reply
Muhammad Rafiq

September 14, 2017 at 5:55 pm

Respected Dr. Charles Zaiontz
I hope i am glad and thankful to see that u solve statistical problem so nicely
Reply
Teodor

September 3, 2017 at 2:15 pm

Thanks alot for a great job.
Reply
Rose

August 5, 2017 at 2:34 am

Hi everyone

Can you please explain if one completes a paired t test and then wants to add an additional score if this can be done. If it can would a Bonferroni measurement need to be completed. I see that being a single person, under same conditions, with 2 scores one pre and post I can’t see a problem with adding it to the previous but would there be a problem and would one score really make a difference requiring a Bonferroni adjustment?
Reply
- Charles
  
  August 6, 2017 at 10:02 pm
  
  Rose, sure you can run another paired t test. But with only two pairs, don’t expect much from the test. Charles
  Reply
Tadele S.

July 25, 2017 at 2:03 pm

Dear Charles:
I just have some question facing while I am working my research. I am working My paper with an effectiveness of early warning system so that am planning to measure dependent variable using four indicators to construct effectiveness index using PCA. The dependent variables are discrete with naturally ordered like effective, more effective, less effective or ineffective. The four variables used to measure early warning effectiveness are :
1. Household income,
2. Household asset
3. Frequency of information
4. Accuracy of information
So How can I create effectiveness index to have a cut-off?

Thanks.
Reply
- Charles
  
  July 25, 2017 at 7:27 pm
  
  Tadele,
  Please see the following webpage:
  Factor Analysis
  Charles
  Reply
Glenda Oh

June 21, 2017 at 2:19 pm

Hi Dr Charles,

Your website is amazing for stats beginners like me. I am currently a bit confused as to which test I should use. I have 2 sets of experiments, each with a CV, one with 12.5% and the other with 8.2%. I would like to know if the 4% reduction in CV I made, whether is it statistically significant or not, should I use a Fisher’s F test or this test (https://real-statistics.com/students-t-distribution/coefficient-of-variation-testing/)?

Thank you so much!

Glenda
Reply
- Charles
  
  June 21, 2017 at 4:30 pm
  
  Glenda,
  I am glad that you like the Real Statistics website,
  You haven’t provided enough information for me to be able to answer your question. It sounds like a chi-square test of independence (or Fisher exact test), but I can’t say for sure.
  Charles
  Reply
  - Glenda Oh
    
    June 22, 2017 at 2:54 am
    
    Hi Charles,
    
    Ah so sorry. So, I ran 2 sets of experiments, both with the same sample, one with a sampling size of 31, CV 12.5% and the other with a sampling size of 23, CV 8.2%. Please let me know if you need more information.
    
    Thank you.
    
    Glenda
    Reply
Eze

June 16, 2017 at 8:16 am

Hi Charles
I’m trying to do a two way ANOVA but the levene’s homogenity test gives p<0.05 (violated)
How do i account for this? Is there a way i can lower my significance to 1% to adjust for the type 1 error with multiple comparisons?
Reply
- Charles
  
  June 16, 2017 at 8:29 am
  
  Eze,
  If Levene’s test is near .05, you can probably still use ANOVA, especially if you have a balanced model (all groups have the same size). Otherwise, I suggest that you explore using Welch’s ANOVA. See
  Welch’s ANOVA
  Charles
  Reply
  - Eze
    
    June 16, 2017 at 9:08 am
    
    Thanks Charles
    Very kind of you.
    I’ve just noticed that my significance value is = 0.005
    I have a balanced model
    so how would i do a two way Welchs ANOVA is spss? will this make my overall significance rate measurable at <0.001
    Thanks so much for your help
    Reply
    - Charles
      
      June 16, 2017 at 10:52 am
      
      Eze,
      Welch’s ANOVA supports only one/way ANOVA situations.
      Charles
      Reply
      - Eze
        
        June 17, 2017 at 2:49 pm
        
        Hi Charles
        Please would you be able to email me so we can discuss via phone? I believe you can see my email address
        it’s the stats part for my dissertation which is really confusing me – I am also happy to pay for your lecture as well.
        regards
      - Charles
        
        June 18, 2017 at 11:06 am
        
        Eze,
        I suggest that you send me an Excel file with your data and the tests that you have already run. You can find my email address at
        Contact Us.
        Charles
Michel Janos

June 8, 2017 at 3:20 pm

Hi Dr. Zaiontz,

Thank you for developing a great Excel add-in for statistics.
I have been using it for 1 year, but after reformat my disc and reinstalling RS I get the following error in Excel 2013
“compilation error in hidden module”
Regards

Michel
Reply
- Charles
  
  June 9, 2017 at 7:31 am
  
  Michel,
  I don’t know what is causing this. Are you using some other add-in as well as Real Statistics?
  Charles
  Reply
Matthias Ho

April 25, 2017 at 10:35 am

Dear Dr.Zaintoz.

Thank you for this fabulous website.
May I ask you a question found on :
https://real-statistics.com/time-series-analysis/stochastic-processes/partial-autocorrelation-function/

I could not figure out how the ACVF are being calculated.

Are there excel formulae to arrive at figures 155314.1 121422.1 89240.19 without using functions?

Thank you!
Reply
- Charles
  
  April 25, 2017 at 2:02 pm
  
  Matthias,
  This can be calculated via the Real Statistics ACVF function, which is described at:
  https://real-statistics.com/time-series-analysis/stochastic-processes/autocorrelation-function/
  This not a standard Excel function, but once you download the free Real-Statistics software you can use it like any other Excel function.
  Charles
  Reply
  - Matthias ho
    
    April 25, 2017 at 2:40 pm
    
    Dear Dr.Zaintoz.
    
    Thank you for your reply !!
    
    That is exactly the point. I wish not to use the ACVF function.
    
    With the following provided by you:
    “Note that ACF(R1, k) is equivalent to
    =SUMPRODUCT(OFFSET(R1,0,0,COUNT(R1)-k)-AVERAGE(R1),OFFSET(R1,k,0,COUNT(R1)-k)-AVERAGE(R1))/DEVSQ(R1)”
    
    I was able to calculate and fully grasp the concept of ACF.
    
    I hope to be able to do the same for PACF.
    
    Best regards,
    Matthias
    Reply
Assad Kafe

March 10, 2017 at 5:47 pm

Dear Dr Charles
Our college uses four version of each exam in order to limit cheating between students, but the problem when diving the students into four group the reliability and difficulty of the hole exam will be affect. My question, Is there any method rejoined the four versions into a single one?
Reply
- Charles
  
  March 11, 2017 at 8:56 am
  
  Assad,
  I see the difficulty that you are trying to address, but I don’t understand your question. Are you trying to find a reliability index for all four exams together?
  Charles
  Reply
Deval

March 6, 2017 at 3:46 pm

Hello,
Is it possible to add something to the real-statistics package?
I have been performing several Tukey HSD/Kramer Test and would like to make it less tedious. It would be helpful if the program performed all the comparisons (ie: 1 and -1) and listed them on the bottom, including the groups that were compared, and also highlighting the significant ones. It can get annoying to perform 10 comparisons when I have 5 groups.

Thank you.
Reply
- Charles
  
  March 6, 2017 at 10:32 pm
  
  Deval,
  Thank you for your suggestion. I will consider making this change.
  Charles
  Reply
Vitor Mauad

February 12, 2017 at 11:45 pm

Hey Dr Zaiontz, first and foremost thx for your help on my last post. I’ve come across another issue i would like some help with if you can spare me some of your time.

I’m doing a paper on the circadian cicle and season on acute myocardial infarction and other acute coronarian syndromes.

Lets say we had out of 60 patients with acute coronatian syndromes, 23 on sumer, and 31 on the time frame in between 0h and 6am, how should i test the estatistical significance and, if possile, relative risk in this senario? How to acount for exposure time diferences between for instance those who had an episode in summer and those who had it in other seasons, considering they had 3 times the amount of exposure time?

Once again, thx in advance
Reply
- Charles
  
  February 13, 2017 at 9:25 am
  
  Vitor,
  I only partially understand the scenario you are describing (e.g. (1) 31 + 23 doesn’t add up to 60 and (2) why are you comparing times like summer with times like 0h to 6am?)
  In any case, for most tests you will need to make sure that the dependent variable values are comparable, and so three times the exposure needs to be taken into account. If whatever you are measuring is uniform in time, then you might be able to simply divide these values by exposure time.
  In any case, these are just ideas. I would need to understand your scenario better before giving any definitive advice.
  Charles
  Reply
  - Vitor Mauad
    
    February 13, 2017 at 4:57 pm
    
    Yes, so sry about It. Lets see, the first senario is as follow, 54 patients with acute episode, 17 in Winters, 15 Summer and 11 in both othe seasons. How to measure the RR ABS IC95% for the Winter ? Should i pair It with each season or relate It with the other Seasons average?
    Reply
    - Charles
      
      February 13, 2017 at 5:00 pm
      
      Vitor,
      Thanks for the additional information, but I still don’t understand what you are trying to do.
      Charles
      Reply
ALI ALI

February 1, 2017 at 10:06 pm

Hi Charles

can you educate how to convert data for log transformation in the following case
1. negative data
2. Proportions data
3. percentages data
Reply
- Charles
  
  February 2, 2017 at 6:46 pm
  
  1. Let a be the value with the small value (i.e. the most negative value). Then use the transformation log(x-a+1) since x-a+1 > 0 for any x
  2. You should be able use log(x)
  3. Same as #3
  Charles
  Reply
Hector A. Quevedo

February 1, 2017 at 9:36 pm

Dear folks:

I am a university professor. Have a passion for statistics with experimental design orientation. I am writing a book of experimental design applied to environmental engineering. However, in the section of Time series analysis I have some applications of temporal autocorrelation using the Durbin-Watson tables. I would like to include the D-W tables in my book for publication purposes, but I need your kindly permission to do so. Could you help me with this issue?
Thanks

Hector A. Quevedo (Ph.D.)

P.D. I am a graduate from the University of Oklahoma. I am an American Citizen living in El Paso, Texas. I am working across the border.
Reply
- Charles
  
  February 2, 2017 at 10:23 pm
  
  Hector,
  I don’t have any problem with you using the Durbin-Watson table on my website, but I have copied the values from the table that I have found in a number of other places on the web.
  Charles
  Reply
MrPink

December 22, 2016 at 1:24 pm

Hello, I’d like to ask a beginner’s question about multiple regression – I’d be incredibly grateful for your time. I’ve only recently learned the basics of linear regression and I still have the following nagging doubt.

I’d like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is ‘profit/loss’, which simply represents a sales figure for individual retail items. This is the variable I would like to forecast; (there are certain quantifiable conditions for each attempted sale of an item and these are my independent variables). My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a FIXED negative value of -100; (an item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items). A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):

23
55
201
-100
13
-100
321
124
57
-100
33

It’s my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I’d like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I’m wondering whether the fixed negative values might somehow pose a problem. I’m not trying to replicate the fixed -100 value for the losses, only trying to get to true averages so that I may accurately predict the profitability of an item’s listing for sale (and avoid unprofitable listings). Hope this all makes sense. Thank you very much.
Reply
- Charles
  
  January 2, 2017 at 11:44 am
  
  I don’t see any problems with this fixed negative amount as long as it truly represents the Y value.
  The real problem you have is that the data may not fit a linear regression model. You should graph the data and see if it is truly linear. You should also graph the residuals and make sure that they are randomly distributed.
  Charles
  Reply
  - MrPink
    
    January 3, 2017 at 3:14 pm
    
    Charles, thank you very much for your reply. I’ll be sure to check that the data meets the various requirements of a linear model.
    
    Regarding your last point, the logic is that the residuals would be randomly distributed because the relationships between the variables remain constant, regardless of the profitability of a given listing. I will of course check the graphs, but it would help to know that I have the theory straight.
    
    May I ask then, can I take it that if that an AVERAGE for Y in my case can be viewed in the same way as that for any appropriate independent variable, that duplicate/fixed values (in themselves) do not pose a problem in a linear regression analysis?
    
    To clarify, let’s say there were no fixed loss amounts for Y in another case, that they were free to fall anywhere on the same continuous scale (as you usually find in any textbook example). Let’s also say that the average for Y in both cases is equal. Is it then safe to say that there is no apparent cause for concern with the data I have (assuming that it is appropriate for a linear regression analysis in every other way)?
    
    Sorry if my limitations here are making things unclear or unnecessarily complicated! Thank you. Ben
    Reply
    - Charles
      
      January 4, 2017 at 8:36 am
      
      I don’t see any particular problems with duplicate data provided the assumptions for regression are met and the negative value can be compared with the other values and is not a conventional value (like coding all missing data as -99).
      Charles
      Reply
      - MrPink
        
        January 4, 2017 at 11:05 am
        
        Got it, thank you so much Charles. Congrats on the excellent resource pack and website here – well done! I hope you are well repaid for sharing your expertise. Ben
      - Charles
        
        January 4, 2017 at 5:14 pm
        
        Great. Glad I could help.
        Charles
sanaullah

November 27, 2016 at 7:57 pm

Dear Dr.Charles
sir i want to you tell may some gaidence about how to writte research paper . pleas sir i have no any good teacher in provience blochitan in country pakistan.
Reply
- Charles
  
  November 28, 2016 at 8:51 am
  
  There are many online guides to how to write a research paper, including the following:
  www3.nd.edu/~pkamat/pdf/researchpaper.pdf
  https://www.liebertpub.com/media/pdf/English-Research-Article-Writing-Guide.pdf
  You can find more by googling “how to write a research paper”
  Charles
  Reply
Sneh Saini

November 6, 2016 at 3:03 pm

Dear Dr.Zaintoz,

Please guide to interprete Tau observed and critical values of Augumented Dickey fuller test for stock market prediction

The variable considered is open

Thanks
Sneh Saini
Reply
- Charles
  
  November 7, 2016 at 9:11 am
  
  See the following webpages
  Dickey-Fuller
  Augmented Dickey-Fuller
  Charles
  Reply
Iro O. Ada

October 18, 2016 at 6:08 pm

Hi, Dr. Charles Zaiontz,
Thanks for your excellent site and relentless effort.
I am working on a research using 4-point likert. I decided to use Chi Square to test the null hypothesis. What kind of test can i use for reliability? is Chi square enough?
Reply
- Iro O. Ada
  
  October 18, 2016 at 6:11 pm
  
  In addition I have 100 respondents.
  
  Thanks in anticipation
  Reply
- Charles
  
  October 18, 2016 at 10:10 pm
  
  Hello Iro,
  Chi-square is not usually considered to be a test for reliability.
  Which test to use depends on what you mean by reliability. Please see the following webpage>
  Reliability.
  Charles
  Reply
  - Iro O. Ada
    
    October 19, 2016 at 3:47 pm
    
    Thanks so much Dr Charles.
    Reply
Mushtaq AL Mohammed

October 18, 2016 at 5:52 am

Dear Dr.Zaintoz,

I want to conduct experiments which will be done by human subjects. The outcome is explained by 10 user predictors and 6 task predictors.I want to calculate the sample size of users and how many tasks that each user should do to achieve a specific power .

Thanks in advance
Mushtaq
Reply
- Charles
  
  October 18, 2016 at 7:38 am
  
  Mushtaq,
  If you are planning to use multiple linear regression to do this, then I suggest that you look at the following webpage
  Sample Size for Regression
  Charles
  Reply
  - Mushtaq AL Mohammed
    
    October 19, 2016 at 3:25 am
    
    Thank you Dr. Zaintoz for replying.
    Does this give sample size of users and tasks separately? because the output variable depends on two sets of predictors: one from users and other from tasks.
    Thanks,
    Mushtaq
    Reply
    - Charles
      
      October 19, 2016 at 8:28 am
      
      Sorry, but I don’t understand your question.
      Charles
      Reply
      - Mushtaq AL Mohammed
        
        October 23, 2016 at 6:17 pm
        
        Dear Dr. Charles,
        
        I mean, the sample size represents the overall observations that I have to get to achieve the requirements . But this number is a combination of number of tasks and users. For example, if sample size is 200, then I have 20 users by 10 tasks, 10 tasks by 20 users , or 40 users by 5 tasks and so on. This is the case if I want each user does the same tasks that done by other users .But, if each user does a different task, I think the sample size =number of users =number of tasks.
        Thanks,
        Mushtaq
Hanna Fritzson

September 21, 2016 at 9:01 am

Hello Charles!
I am a student at Uppsala University in Sweden, and I have unfortunately not done any statistics during my four years there, which I regret now when I’m doing my master thesis (earth science).
I have been trying to understand Time series analysis and PCA but it seems extremely complicated. I was just wondering if you could help me answer a basic question about it?
I have data of pore pressures at three different depths (in the ground), measured two times per day for about 5 years. The problem is that sometimes the measuring device stopped working so there are a lot of missing data, sometimes days, sometimes months. So my question is if it is even possible to make a time series analysis or a PCA with this kind of data?
Kind regards, Hanna Fritzson
Reply
- Charles
  
  September 22, 2016 at 1:36 pm
  
  Hanna,
  It is often possible to handle missing data, but care needs to be taken. See the following webpage:
  Handling Missing Data
  Charles
  Reply
Maxwell

September 17, 2016 at 10:29 pm

Hi Dr. Raju,

Thank you for making this helpful Excel-Stats tool available for the public free of charge!

The Max-Diff analysis is becoming a popular analytical method for consumers’ preference (see “https://datagame.io/maxdiff-excel-modeling-template/”). Would you consider adding the Max-Diff analysis module to the current Real Statistics Resource Pack (release 4.9, as of 9/17/2016)? Thanks in advance for your consideration (or advice if such an analytical tool is available/accessible free of charge elsewhere).

Thanks,
Max
Reply
- Charles
  
  September 18, 2016 at 6:22 am
  
  Hi Max,
  I am about the issue the next release of the Real Statistics Resource Pack, but I will consider Max-Diff analysis for a future release.
  Charles
  Reply
조아라

August 15, 2016 at 12:36 am

Hi Charles,

Thank you so much for sharing the use of excel instead of SPSS! This is exactly what I was looking for!!!! I am very glad to find your website.

I am actually in the middle of completing a dissertation and struggling with analyzing the results I got from an online survey software which is very good and easy. However, I wanted to confirm if I am OK to use only excel for analyzing to complete the dissertation. Also if I should prove the reliability on each results I got from the survey.

Please let me know if you need more information to answer my questions.
I am very happy to have a conversation with you via email if you are free.

Thank you so much in advance!
Reply
- Charles
  
  August 16, 2016 at 10:02 am
  
  Of course, my objective is to provide analysis capabilities in Excel that are accurate and just as good as those provided by tools such as SPSS. You should be aware that there is a bias against using Excel for statistical analysis. Some of this is based on some errors in earlier version of Excel and the lack of many of the mostly commonly used tools. Microsoft has corrected these errors in the last few releases of Excel and I have worked hard to add capabilities that are assessible from Excel but are missing from standard Excel.
  Charles
  Reply
Jehvee C. Soriano

August 3, 2016 at 1:37 am

Hi Sir Charles!
Thank you so much for the information about Kappa stat. It helped me a lot in my Research.
I’m actually conducting a research about the Effectiveness of a Diagnostic kit (local) compared to the commercial one using 60 samples. The commercial one is my Golden standard. And my adviser told me to use Kappa. However, I’m still in the process of absorbing the information/formula. Do you have any simpler formula for my problem?
I’m sorry for my demand but thank you in advance Charles! God bless!
Reply
- Charles
  
  August 3, 2016 at 9:46 am
  
  Jehvee,
  If your adviser wants you to use kappa, then you need to use the formula for kappa. This formula is not very complicated. See Cohen’s Kappa for details.
  Charles
  Reply
Vitor Augusto Queiroz Mauad

July 29, 2016 at 4:11 pm

Hey Charles, I’ve come across another problem and was hopping you may be able to help me. I have a group of patients with an expected bleeding risk of 5, 48% in 3 months, and a thrombotic risk of around 10% over the same amount of time, those calculations came from big studies with over 3000 patients. Anyway, how can I compare those so see if this difference justify or not the use of an anticoagulant agent? Would Odds or Hazard ratio be useful in the scenario? To compare chances of 2 different events over the same sample? Thx in advance
Reply
- Charles
  
  August 7, 2016 at 5:03 pm
  
  Vitor,
  It sounds like survival analysis (hazard ratio) might be the way to go. See the following webpage
  Survival Analysis
  Of course, you need to determine what sort of ratio is acceptable.
  Charles
  Reply
Leonardo Barbieri

July 29, 2016 at 3:05 pm

Dear Charles,

First, I would like to congratulate you for your website.
I wonder if you have examples of the sequential version of the statistical test of Mann-Kendall which detects change points in time series.
Reply
- Charles
  
  August 1, 2016 at 1:12 pm
  
  Dear Leonardo,
  Thanks for your kind remarks about the website.
  Unfortunately, I don’t yet support the Mann-Kendall test.
  Charles
  Reply
tom holder

July 27, 2016 at 9:38 pm

On another matter, I have an excel spreadsheet of 2600 university donor prospects with 20 potential predictor variables and am trying to predict those that have a higher likelihood of giving a gift of $10K+. I have experimented with the logistic regression tool and found it very difficult to use and interpret. I’m wondering if a simpler and equally effective solution would be to simply group the prospect list into groups representing all possible configurations of the predictor variables ( permutations?), compute the average number of $10K gifts given by each group historically, then rank the groups from highest to lowest. Those groups with the highest number of $10K+ gifts would receive priority in future fund raising. Does this make sense?
Reply
- Charles
  
  August 1, 2016 at 1:23 pm
  
  Tom,
  It would seem that there would be a very high number of permutations of the 20 predictor variables. Even if each predictor variable was binary you would have 2^20 possibilities, which is a number larger than one million. With only 2600 donors most of the combinations would not have any representation. I can’t say that a logistic regression model would work any better.
  Charles
  Reply

491 thoughts on “Author”

Leave a Comment Cancel reply