Author

Charles ZaiontzDr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

489 thoughts on “Author”

  1. Hi Dr. Charles Zaiontz,
    This is the cronbach alpha’s reliability. You can see this table.Is there any reference to support this table?If you say yes, could you please send reference? or How can i decide this alpha is high reliability or not? References is too important for me.thank you .
    Cronbach alpha’s Reliability
    0.00≤̠α<0.40 Scale not reliable
    0.40≤̠α<0.40 Scale low reliability
    0.60≤̠α<0.80 Quite reliable
    0.80≤̠α<1.00 High reliability
    Regards,
    Ebru

    Reply
    • Ebru,
      There isn’t universal agreement about these ranges. If you look at the Wikipedia entry for Cronbach’s alpha you will see different ranges. I am sure that some books will have some version of these ranges, while others won’t include any ranges.
      Charles

      Reply
  2. Dear Charles,

    Hope you are well

    I have only one urgent inquiry regarding normal distribution test.

    Typically, p < 0.05 of the Shapiro-Wilk test indicates data are not normally distributed.

    My data are normally distributed at .001, as my data is not normally distributed at .05.
    So, I have chosen .001 since this is a more relaxed criterion to assess normality compared to .05.

    Is this accepted? What do you think?
    Is there any reference to support this?

    Regards,
    Abdul

    Reply
    • Abdul,
      Most tests are pretty robust to departures from normality and so using .001 is probably good enough. The risk is that in your particular case, the data is truly not coming from a normally distributed population and any test that you are using is really invalid. This risk is probably pretty low, but it exists. I suggest that you take a graphic look at your data (histogram, QQ plot, etc.) to see whether it looks normally distributed.
      Charles

      Reply
  3. Hey Charles!
    When I use ADFTEST function in excel, it only returns a single cell named as “tau stat” I am unable to get an 8*2 output. how should I do it ?

    Reply
  4. Hi Charles
    Which experimental design do I use if doing an experiment on 9 different soil components for two different plants. The height of the plant and color (green/yellow) will be measured
    Vusani

    Reply
  5. Hi Charles,

    Firstly, thanks for your site and app, it’s so appreciated.

    I have Excel 2016 desktop application. I would like to add Real Statistics to my Ribbon in an existing Tab with other marcos I have already in it. When I try to add it I can’t find the macro in the Marcos list to apply it.

    Could you please let me know if this is possible?

    Reply
  6. Dear Charles.
    Is it possible to extract the PCA components’ timeseries when using the Factor analysis of your XL add-in ?
    Best regards.

    Reply
  7. Sir can you please explain the method of finding shape and scale parameters for wind data using Weibull distribution ?

    Reply
  8. Charles,

    I am a layman, so please forgive the language. If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.

    I know, for a fact, that there are regular, timed events in the market. In fact there are many of them. There are also timed news events that manipulate the market.

    Wouldn’t this then “distort” the series away from a Normal Distribution to some other form. I have seen a white paper that describes the market as a Sinusoidal Distribution.

    I guess the question is, can a Normal Distribution have a regular pattern in it? (Obviously this is an interesting subject to Traders)

    Somewhat confused
    David Shields

    Reply
    • David,
      What is the source of “If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.”?
      If data completely follows a normal distribution then it would follow a regular pattern (described by the bell curve), but clearly real data would have some randomness to it.
      Charles

      Reply
    • I believe that you are saying that you have (1) one Treatment factor with 4 levels, (2) one repeated measures Time factor with two levels and (3) 3 replications. You won’t be able to address this with any of Excel 2007’s data analysis tools. You can use Real Statistics’ Mixed Repeated Measures ANOVA data analysis tool to perform this analysis in Excel 2007.
      Charles

      Reply
  9. Hi Charles,
    Thank you so much for great excel sheet for the Kendall’s W. I would need your technical help for Kendall’s W. I will use Kendal’s W for the data analysis for my PhD thesis. However, I have a problem with.
    Let me brief you about the nature of my date as following:
    1. There are 27 Judges to rank the 15 objects.
    2. I used ordinal scale while I conducted the survey. Thus, I diagnose 1=strongly agree, 2=agree, 3=neither agree or disagree, 4=disagree and 5=strongly disagree.
    3. The question research was about: That is level of agreement of rater to Cambodia’s legal framework articulate the community-managed forestry.
    4. I tried to compute Kendall’s W about this issue to find which policies among 15 is the pleasantness to raters that they believe these policies highly and greatly articulate the important role and attributes of community-managed forestry into it – by doing so the sustainable forestry management will be existed in Cambodia. I would request to have your email, I thus can ask you for the technical issues. This is my email: nhemsareth@gmail.com
    Best regards,
    Sareth

    Reply
  10. Hi Charles,

    I cannot thank you enough for the material that you put in this website. Recently, I had a statistics assignment for my master degree and I learned a lot by going through your lessons.

    Thank you very much for providing such a fine work.

    best regards,
    Khalid

    Reply
  11. Great website. I’ve learned how to use Jenks Natural Breaks Optimization in order to cluster products by cost.

    One question though, now I want to cluster products by cost and manufacturer. I’m struggling with a solution for this. Any suggestions?

    Thanks!

    Reply
  12. Sir
    I have questions,
    i have read a journal about overlapping clustering, it uses kmeans algo.. It only uses maximum distance to identify overlap clusters generated by kmeans.
    My question is, Is there a fix maximum distance allowed by K-means in assigning the data objects to a cluster? Or the maximum distance is the distance of object(measured distance of objects to its centroid) that were assigned in the cluster?
    Thank you

    Reply
    • If I remember correctly, the algorithm seeks to minimize the distance between any member in a cluster and its centroid, and to maximize the distance the centroid and points not in its cluster. I don’t believe there is a predefined maximum distance.
      Charles

      Reply
  13. Sir,

    1. I have applied Box Cox transformations as suggested by you but it didn’t worked. I am using 5 point likert scale in my research and majority of responses lies between 3 to 5. I used the value of λ as 1 as no clue was given in your literature. I was able to calculate the value of X & Y but the value of Z and r didn’t come when I have applied formula for Z as =NORM.S.INV((H$4-0.5)/H$203). My data started from cell H4 to H203.
    2. What should the optimum sample size for using non-parametric tests. Do the 5 point likert test is less efficient than a likert test of say 10 or 11 points for normality check of the data.

    Best regards.
    Gulab Kaliramna

    Reply
    • Gulab,
      1. I don’t have any further advice, except to try other values for lambda.
      2. 10 point Likert has better chances than 5 point Likert of satisfying normality. I haven’t come across many ways of calculating the minimum sample size for nonparametric tests. I know that when normality is satisfied then power of the Mann-Whitney test is about 94% of the power of the t test, everything else being equal. This means that the sample size required is only slightly higher than for the t test (when normality is satisfied).
      Charles

      Reply
  14. Dear Dr. Charles Zaiontz,

    1. I have collected data from 200 respondents on 5 point likert scale using Occupational role stress (ORS) by Udai Pareek but my data is not normal. I have used square root, inverse and double transformation method to normalize my data. Is there any other method to normalize the data.
    2. I also want to know, is it compulsory to use parametric tests only to score good in research work.

    Best regards,
    Gulab Kaliramna

    Reply
    • Gulab,
      1. See Box Cox transformations at
      Box Cox Transformations
      2. If the test assumptions are not met for a parametric test, then it is perfectly ok to use a nonparametric test. The main downside is that the power of such a test will be lower, which may require a larger sample size.
      Charles

      Reply
  15. Dear Dr. Charles Zaiontz

    This is just another thanks message. But I can not fail to publicly thank you for your work. I am sure that your life reflects the good that you have done to everyone with your work on this site. Have you visited Portugal yet? Please do so. We are people of good will, and we would like to welcome you.

    Reply
    • Jorge,
      Thank you very much for your kind words.
      I have visited Portugal before, but this was many years ago. I enjoyed my visit and the people very much.
      Charles

      Reply
  16. Hello Charles, i am following your website to get this valuable knowledge. the book you mentioned in the post is “Statistics using Excel @ Succinctly” or do you have some other book as well. i am inclined on your book because of excel use and we can do what we are reading. Do you have the sample excel files as well for your topics.

    I have downloaded “Statistics using Excel @ Succinctly” and printing this.

    Reply
    • Hello,
      No. “Statistics using Excel @ Succinctly” is not the book mentioned in the post (although book was written by me). The book I am referring to in the post will much more detailed and should be coming out early in 2018.
      Charles

      Reply
      • We will be waiting…. I am not sure whether we would be able to have it / buy it in india soon. Please plan for the PDF format as well. Best wishes for the book.

        Reply
        • There will be a pdf format that you can buy in India.
          I have already finished writing the book and was hoping to publish the book this year, but I have struggled to find the time to finish proof-reading it and making any necessary revisions. In any case, it should be available in the first part of 2018.
          Charles

          Reply
  17. Dear Dr. Zaintoz, do you also offer private consulting? I might need some explanation on how to run some tests…ICC and kappa and/or Fleiss
    Please let me know.

    PS. Non riesco a trovare su internet alcun video che spieghi come calcolare kappa oppure Fleiss con piu’ observers. Sono un po’ disperata. Sto provando ad usare Excel e spero ardentemente di non dover imparare SPSS.

    Reply
  18. Hi Charles!

    First of all, thank you for the amazing website!

    I want to test whether a noise signal is white or not, that is, I want to verify that the correlation between samples is null. I have the signal but don’t know what test to perform. What do you recommend?

    Tiago Silva

    Reply
  19. Hi,
    I just want to thank you for this outstanding website and information you have put time on. I am great full for the work you have put in this website which helped me a lot with my assignment, and i am sure it helped thousands others around the world as well.

    David from Australia

    Reply
    • Thanks for your kind remarks, David. I am very gratified when I see that people are getting value from the website and software. I hope to continue to expand the topics covered and to improve the learning experience.
      Charles

      Reply
  20. The binomial distribution (BD) was created for determining P(k;n,p), the probability of k successes in a number n of independent trials with a constant probability p of success. . Given the (𝑛¦𝑘) BD term does not have practical importance. For example if you do four trails, it will be indifferent to obtain one success in the first, second, third or fourth. And even the products of probabilities are questionable.

    The statistical models project (SMp) proposes the following expression: 𝑃(𝑘;𝑛,𝑝)=(𝑘∗𝑝+(𝑛−𝑘)∗(1−𝑝))/𝑛

    which represents a weighted for p and (1-p), i.e. respectively probability of success or failure in each trial.

    Is the SMp P(k;n,p) expression more appropriate than the current Binomial one?

    Reply
    • Terman,
      I have always thought that one of the advantages of the binomial distribution is that it is indifferent to which trials had the successes and which had the failures.
      Can you give me an example where the order of the successes and failures would be important?
      I don’t fully understand the meaning of the SMp expression for 𝑃(𝑘;𝑛,𝑝). Is this really supposed to be the probability of success on each trial as you have stated? I thought this was p.
      Charles

      Reply
  21. Hi everyone

    Can you please explain if one completes a paired t test and then wants to add an additional score if this can be done. If it can would a Bonferroni measurement need to be completed. I see that being a single person, under same conditions, with 2 scores one pre and post I can’t see a problem with adding it to the previous but would there be a problem and would one score really make a difference requiring a Bonferroni adjustment?

    Reply
  22. Dear Charles:
    I just have some question facing while I am working my research. I am working My paper with an effectiveness of early warning system so that am planning to measure dependent variable using four indicators to construct effectiveness index using PCA. The dependent variables are discrete with naturally ordered like effective, more effective, less effective or ineffective. The four variables used to measure early warning effectiveness are :
    1. Household income,
    2. Household asset
    3. Frequency of information
    4. Accuracy of information
    So How can I create effectiveness index to have a cut-off?

    Thanks.

    Reply
  23. Hi Dr Charles,

    Your website is amazing for stats beginners like me. I am currently a bit confused as to which test I should use. I have 2 sets of experiments, each with a CV, one with 12.5% and the other with 8.2%. I would like to know if the 4% reduction in CV I made, whether is it statistically significant or not, should I use a Fisher’s F test or this test (https://real-statistics.com/students-t-distribution/coefficient-of-variation-testing/)?

    Thank you so much!

    Glenda

    Reply
    • Glenda,
      I am glad that you like the Real Statistics website,
      You haven’t provided enough information for me to be able to answer your question. It sounds like a chi-square test of independence (or Fisher exact test), but I can’t say for sure.
      Charles

      Reply
      • Hi Charles,

        Ah so sorry. So, I ran 2 sets of experiments, both with the same sample, one with a sampling size of 31, CV 12.5% and the other with a sampling size of 23, CV 8.2%. Please let me know if you need more information.

        Thank you.

        Glenda

        Reply
  24. Hi Charles
    I’m trying to do a two way ANOVA but the levene’s homogenity test gives p<0.05 (violated)
    How do i account for this? Is there a way i can lower my significance to 1% to adjust for the type 1 error with multiple comparisons?

    Reply
    • Eze,
      If Levene’s test is near .05, you can probably still use ANOVA, especially if you have a balanced model (all groups have the same size). Otherwise, I suggest that you explore using Welch’s ANOVA. See
      Welch’s ANOVA
      Charles

      Reply
      • Thanks Charles
        Very kind of you.
        I’ve just noticed that my significance value is = 0.005
        I have a balanced model
        so how would i do a two way Welchs ANOVA is spss? will this make my overall significance rate measurable at <0.001
        Thanks so much for your help

        Reply
          • Hi Charles
            Please would you be able to email me so we can discuss via phone? I believe you can see my email address
            it’s the stats part for my dissertation which is really confusing me – I am also happy to pay for your lecture as well.
            regards

  25. Hi Dr. Zaiontz,

    Thank you for developing a great Excel add-in for statistics.
    I have been using it for 1 year, but after reformat my disc and reinstalling RS I get the following error in Excel 2013
    “compilation error in hidden module”
    Regards

    Michel

    Reply
      • Dear Dr.Zaintoz.

        Thank you for your reply !!

        That is exactly the point. I wish not to use the ACVF function.

        With the following provided by you:
        “Note that ACF(R1, k) is equivalent to
        =SUMPRODUCT(OFFSET(R1,0,0,COUNT(R1)-k)-AVERAGE(R1),OFFSET(R1,k,0,COUNT(R1)-k)-AVERAGE(R1))/DEVSQ(R1)”

        I was able to calculate and fully grasp the concept of ACF.

        I hope to be able to do the same for PACF.

        Best regards,
        Matthias

        Reply
  26. Dear Dr Charles
    Our college uses four version of each exam in order to limit cheating between students, but the problem when diving the students into four group the reliability and difficulty of the hole exam will be affect. My question, Is there any method rejoined the four versions into a single one?

    Reply
    • Assad,
      I see the difficulty that you are trying to address, but I don’t understand your question. Are you trying to find a reliability index for all four exams together?
      Charles

      Reply
  27. Hello,
    Is it possible to add something to the real-statistics package?
    I have been performing several Tukey HSD/Kramer Test and would like to make it less tedious. It would be helpful if the program performed all the comparisons (ie: 1 and -1) and listed them on the bottom, including the groups that were compared, and also highlighting the significant ones. It can get annoying to perform 10 comparisons when I have 5 groups.

    Thank you.

    Reply
  28. Hey Dr Zaiontz, first and foremost thx for your help on my last post. I’ve come across another issue i would like some help with if you can spare me some of your time.

    I’m doing a paper on the circadian cicle and season on acute myocardial infarction and other acute coronarian syndromes.

    Lets say we had out of 60 patients with acute coronatian syndromes, 23 on sumer, and 31 on the time frame in between 0h and 6am, how should i test the estatistical significance and, if possile, relative risk in this senario? How to acount for exposure time diferences between for instance those who had an episode in summer and those who had it in other seasons, considering they had 3 times the amount of exposure time?

    Once again, thx in advance

    Reply
    • Vitor,
      I only partially understand the scenario you are describing (e.g. (1) 31 + 23 doesn’t add up to 60 and (2) why are you comparing times like summer with times like 0h to 6am?)
      In any case, for most tests you will need to make sure that the dependent variable values are comparable, and so three times the exposure needs to be taken into account. If whatever you are measuring is uniform in time, then you might be able to simply divide these values by exposure time.
      In any case, these are just ideas. I would need to understand your scenario better before giving any definitive advice.
      Charles

      Reply
      • Yes, so sry about It. Lets see, the first senario is as follow, 54 patients with acute episode, 17 in Winters, 15 Summer and 11 in both othe seasons. How to measure the RR ABS IC95% for the Winter ? Should i pair It with each season or relate It with the other Seasons average?

        Reply
  29. Hi Charles

    can you educate how to convert data for log transformation in the following case
    1. negative data
    2. Proportions data
    3. percentages data

    Reply
    • 1. Let a be the value with the small value (i.e. the most negative value). Then use the transformation log(x-a+1) since x-a+1 > 0 for any x
      2. You should be able use log(x)
      3. Same as #3
      Charles

      Reply
  30. Dear folks:

    I am a university professor. Have a passion for statistics with experimental design orientation. I am writing a book of experimental design applied to environmental engineering. However, in the section of Time series analysis I have some applications of temporal autocorrelation using the Durbin-Watson tables. I would like to include the D-W tables in my book for publication purposes, but I need your kindly permission to do so. Could you help me with this issue?
    Thanks

    Hector A. Quevedo (Ph.D.)

    P.D. I am a graduate from the University of Oklahoma. I am an American Citizen living in El Paso, Texas. I am working across the border.

    Reply
    • Hector,
      I don’t have any problem with you using the Durbin-Watson table on my website, but I have copied the values from the table that I have found in a number of other places on the web.
      Charles

      Reply
  31. Hello, I’d like to ask a beginner’s question about multiple regression – I’d be incredibly grateful for your time. I’ve only recently learned the basics of linear regression and I still have the following nagging doubt.

    I’d like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is ‘profit/loss’, which simply represents a sales figure for individual retail items. This is the variable I would like to forecast; (there are certain quantifiable conditions for each attempted sale of an item and these are my independent variables). My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a FIXED negative value of -100; (an item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items). A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):

    23
    55
    201
    -100
    13
    -100
    321
    124
    57
    -100
    33

    It’s my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I’d like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I’m wondering whether the fixed negative values might somehow pose a problem. I’m not trying to replicate the fixed -100 value for the losses, only trying to get to true averages so that I may accurately predict the profitability of an item’s listing for sale (and avoid unprofitable listings). Hope this all makes sense. Thank you very much.

    Reply
    • I don’t see any problems with this fixed negative amount as long as it truly represents the Y value.
      The real problem you have is that the data may not fit a linear regression model. You should graph the data and see if it is truly linear. You should also graph the residuals and make sure that they are randomly distributed.
      Charles

      Reply
      • Charles, thank you very much for your reply. I’ll be sure to check that the data meets the various requirements of a linear model.

        Regarding your last point, the logic is that the residuals would be randomly distributed because the relationships between the variables remain constant, regardless of the profitability of a given listing. I will of course check the graphs, but it would help to know that I have the theory straight.

        May I ask then, can I take it that if that an AVERAGE for Y in my case can be viewed in the same way as that for any appropriate independent variable, that duplicate/fixed values (in themselves) do not pose a problem in a linear regression analysis?

        To clarify, let’s say there were no fixed loss amounts for Y in another case, that they were free to fall anywhere on the same continuous scale (as you usually find in any textbook example). Let’s also say that the average for Y in both cases is equal. Is it then safe to say that there is no apparent cause for concern with the data I have (assuming that it is appropriate for a linear regression analysis in every other way)?

        Sorry if my limitations here are making things unclear or unnecessarily complicated! Thank you. Ben

        Reply
        • I don’t see any particular problems with duplicate data provided the assumptions for regression are met and the negative value can be compared with the other values and is not a conventional value (like coding all missing data as -99).
          Charles

          Reply
  32. Dear Dr.Charles
    sir i want to you tell may some gaidence about how to writte research paper . pleas sir i have no any good teacher in provience blochitan in country pakistan.

    Reply
  33. Dear Dr.Zaintoz,

    Please guide to interprete Tau observed and critical values of Augumented Dickey fuller test for stock market prediction

    The variable considered is open

    Thanks
    Sneh Saini

    Reply
  34. Hi, Dr. Charles Zaiontz,
    Thanks for your excellent site and relentless effort.
    I am working on a research using 4-point likert. I decided to use Chi Square to test the null hypothesis. What kind of test can i use for reliability? is Chi square enough?

    Reply
  35. Dear Dr.Zaintoz,

    I want to conduct experiments which will be done by human subjects. The outcome is explained by 10 user predictors and 6 task predictors.I want to calculate the sample size of users and how many tasks that each user should do to achieve a specific power .

    Thanks in advance
    Mushtaq

    Reply
      • Thank you Dr. Zaintoz for replying.
        Does this give sample size of users and tasks separately? because the output variable depends on two sets of predictors: one from users and other from tasks.
        Thanks,
        Mushtaq

        Reply
          • Dear Dr. Charles,

            I mean, the sample size represents the overall observations that I have to get to achieve the requirements . But this number is a combination of number of tasks and users. For example, if sample size is 200, then I have 20 users by 10 tasks, 10 tasks by 20 users , or 40 users by 5 tasks and so on. This is the case if I want each user does the same tasks that done by other users .But, if each user does a different task, I think the sample size =number of users =number of tasks.
            Thanks,
            Mushtaq

  36. Hello Charles!
    I am a student at Uppsala University in Sweden, and I have unfortunately not done any statistics during my four years there, which I regret now when I’m doing my master thesis (earth science).
    I have been trying to understand Time series analysis and PCA but it seems extremely complicated. I was just wondering if you could help me answer a basic question about it?
    I have data of pore pressures at three different depths (in the ground), measured two times per day for about 5 years. The problem is that sometimes the measuring device stopped working so there are a lot of missing data, sometimes days, sometimes months. So my question is if it is even possible to make a time series analysis or a PCA with this kind of data?
    Kind regards, Hanna Fritzson

    Reply
  37. Hi Dr. Raju,

    Thank you for making this helpful Excel-Stats tool available for the public free of charge!

    The Max-Diff analysis is becoming a popular analytical method for consumers’ preference (see “https://datagame.io/maxdiff-excel-modeling-template/”). Would you consider adding the Max-Diff analysis module to the current Real Statistics Resource Pack (release 4.9, as of 9/17/2016)? Thanks in advance for your consideration (or advice if such an analytical tool is available/accessible free of charge elsewhere).

    Thanks,
    Max

    Reply
    • Hi Max,
      I am about the issue the next release of the Real Statistics Resource Pack, but I will consider Max-Diff analysis for a future release.
      Charles

      Reply
  38. Hi Charles,

    Thank you so much for sharing the use of excel instead of SPSS! This is exactly what I was looking for!!!! I am very glad to find your website.

    I am actually in the middle of completing a dissertation and struggling with analyzing the results I got from an online survey software which is very good and easy. However, I wanted to confirm if I am OK to use only excel for analyzing to complete the dissertation. Also if I should prove the reliability on each results I got from the survey.

    Please let me know if you need more information to answer my questions.
    I am very happy to have a conversation with you via email if you are free.

    Thank you so much in advance!

    Reply
    • Of course, my objective is to provide analysis capabilities in Excel that are accurate and just as good as those provided by tools such as SPSS. You should be aware that there is a bias against using Excel for statistical analysis. Some of this is based on some errors in earlier version of Excel and the lack of many of the mostly commonly used tools. Microsoft has corrected these errors in the last few releases of Excel and I have worked hard to add capabilities that are assessible from Excel but are missing from standard Excel.
      Charles

      Reply
  39. Hi Sir Charles!
    Thank you so much for the information about Kappa stat. It helped me a lot in my Research.
    I’m actually conducting a research about the Effectiveness of a Diagnostic kit (local) compared to the commercial one using 60 samples. The commercial one is my Golden standard. And my adviser told me to use Kappa. However, I’m still in the process of absorbing the information/formula. Do you have any simpler formula for my problem?
    I’m sorry for my demand but thank you in advance Charles! God bless!

    Reply
  40. Hey Charles, I’ve come across another problem and was hopping you may be able to help me. I have a group of patients with an expected bleeding risk of 5, 48% in 3 months, and a thrombotic risk of around 10% over the same amount of time, those calculations came from big studies with over 3000 patients. Anyway, how can I compare those so see if this difference justify or not the use of an anticoagulant agent? Would Odds or Hazard ratio be useful in the scenario? To compare chances of 2 different events over the same sample? Thx in advance

    Reply
  41. Dear Charles,

    First, I would like to congratulate you for your website.
    I wonder if you have examples of the sequential version of the statistical test of Mann-Kendall which detects change points in time series.

    Reply
    • Dear Leonardo,
      Thanks for your kind remarks about the website.
      Unfortunately, I don’t yet support the Mann-Kendall test.
      Charles

      Reply
  42. On another matter, I have an excel spreadsheet of 2600 university donor prospects with 20 potential predictor variables and am trying to predict those that have a higher likelihood of giving a gift of $10K+. I have experimented with the logistic regression tool and found it very difficult to use and interpret. I’m wondering if a simpler and equally effective solution would be to simply group the prospect list into groups representing all possible configurations of the predictor variables ( permutations?), compute the average number of $10K gifts given by each group historically, then rank the groups from highest to lowest. Those groups with the highest number of $10K+ gifts would receive priority in future fund raising. Does this make sense?

    Reply
    • Tom,
      It would seem that there would be a very high number of permutations of the 20 predictor variables. Even if each predictor variable was binary you would have 2^20 possibilities, which is a number larger than one million. With only 2600 donors most of the combinations would not have any representation. I can’t say that a logistic regression model would work any better.
      Charles

      Reply

Leave a Comment