Dr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).
Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.
Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.
He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.
He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.
Hi Dr. Charles Zaiontz,
This is the cronbach alpha’s reliability. You can see this table.Is there any reference to support this table?If you say yes, could you please send reference? or How can i decide this alpha is high reliability or not? References is too important for me.thank you .
Cronbach alpha’s Reliability
0.00≤̠α<0.40 Scale not reliable
0.40≤̠α<0.40 Scale low reliability
0.60≤̠α<0.80 Quite reliable
0.80≤̠α<1.00 High reliability
Regards,
Ebru
Ebru,
There isn’t universal agreement about these ranges. If you look at the Wikipedia entry for Cronbach’s alpha you will see different ranges. I am sure that some books will have some version of these ranges, while others won’t include any ranges.
Charles
Dear Charles,
Hope you are well
I have only one urgent inquiry regarding normal distribution test.
Typically, p < 0.05 of the Shapiro-Wilk test indicates data are not normally distributed.
My data are normally distributed at .001, as my data is not normally distributed at .05.
So, I have chosen .001 since this is a more relaxed criterion to assess normality compared to .05.
Is this accepted? What do you think?
Is there any reference to support this?
Regards,
Abdul
Abdul,
Most tests are pretty robust to departures from normality and so using .001 is probably good enough. The risk is that in your particular case, the data is truly not coming from a normally distributed population and any test that you are using is really invalid. This risk is probably pretty low, but it exists. I suggest that you take a graphic look at your data (histogram, QQ plot, etc.) to see whether it looks normally distributed.
Charles
Hello,
nothing to see but thank you for the website.
kind regards
Hello,
Is there any GLS implementation available in Microsoft excel
Thanks
Rahul,
No, but there is GLS support in the Real Statistics software.
Charles
Dear,
Can you help me in writing findings on my ARIMA model in population analysis?I am stuck here.I can mail you this analysis.
P:S:I am not mathematician.
Farah,
You can mail me at the email address found at Contact Us.
Charles
Hey Charles!
When I use ADFTEST function in excel, it only returns a single cell named as “tau stat” I am unable to get an 8*2 output. how should I do it ?
Anam,
ADFTEST is an Excel array function. To get the full output you can’t simply press Enter. See the following webpage for how to handle array functions.
Array Formulas and Functions
Charles
Hi Charles
Which experimental design do I use if doing an experiment on 9 different soil components for two different plants. The height of the plant and color (green/yellow) will be measured
Vusani
Vusani,
What hypothesis (or hypotheses) do you want to test?
Charles
I want to test the effect of the soil components on the two different vegetables
Colour of leaves is for the healthiness of the plant
Vusani,
Since you have two dependent variables (colour and height) this looks to be a fit for MANOVA. The “devil is in the details”, and so the exact test depends on factors that you haven’t specified.
Charles
Hi Charles,
Firstly, thanks for your site and app, it’s so appreciated.
I have Excel 2016 desktop application. I would like to add Real Statistics to my Ribbon in an existing Tab with other marcos I have already in it. When I try to add it I can’t find the macro in the Marcos list to apply it.
Could you please let me know if this is possible?
Nick,
I imagine that it is possible, but I haven’t figured out how to do this. Perhaps someone else knows how to do this.
Charles
Hi Charles,
Thanks for prompt reply.
If anyone has solution it’s apprecaited.
Nick.
Dear Charles.
Is it possible to extract the PCA components’ timeseries when using the Factor analysis of your XL add-in ?
Best regards.
Greg,
I don’t know what you mean by “PCA components’ time series“.
Charles
Sir can you please explain the method of finding shape and scale parameters for wind data using Weibull distribution ?
Sam,
See https://real-statistics.com/distribution-fitting/
Charles
Charles,
I am a layman, so please forgive the language. If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.
I know, for a fact, that there are regular, timed events in the market. In fact there are many of them. There are also timed news events that manipulate the market.
Wouldn’t this then “distort” the series away from a Normal Distribution to some other form. I have seen a white paper that describes the market as a Sinusoidal Distribution.
I guess the question is, can a Normal Distribution have a regular pattern in it? (Obviously this is an interesting subject to Traders)
Somewhat confused
David Shields
David,
What is the source of “If the timeseries is sourced from Foreign Exchange money market data, it is said to be a Normal Distribution and is subject to the Random Walk idea.”?
If data completely follows a normal distribution then it would follow a regular pattern (described by the bell curve), but clearly real data would have some randomness to it.
Charles
4 treatment 3 replica each in two time factor. Can it be find in ANOVA in Excel 2007 with replication
I believe that you are saying that you have (1) one Treatment factor with 4 levels, (2) one repeated measures Time factor with two levels and (3) 3 replications. You won’t be able to address this with any of Excel 2007’s data analysis tools. You can use Real Statistics’ Mixed Repeated Measures ANOVA data analysis tool to perform this analysis in Excel 2007.
Charles
The site is extremely helpful, thank you for your effort, sir!
Hi Charles,
Thank you so much for great excel sheet for the Kendall’s W. I would need your technical help for Kendall’s W. I will use Kendal’s W for the data analysis for my PhD thesis. However, I have a problem with.
Let me brief you about the nature of my date as following:
1. There are 27 Judges to rank the 15 objects.
2. I used ordinal scale while I conducted the survey. Thus, I diagnose 1=strongly agree, 2=agree, 3=neither agree or disagree, 4=disagree and 5=strongly disagree.
3. The question research was about: That is level of agreement of rater to Cambodia’s legal framework articulate the community-managed forestry.
4. I tried to compute Kendall’s W about this issue to find which policies among 15 is the pleasantness to raters that they believe these policies highly and greatly articulate the important role and attributes of community-managed forestry into it – by doing so the sustainable forestry management will be existed in Cambodia. I would request to have your email, I thus can ask you for the technical issues. This is my email: nhemsareth@gmail.com
Best regards,
Sareth
Sareth,
See Contact Us for my email address.
Charles
Hi Charles,
I cannot thank you enough for the material that you put in this website. Recently, I had a statistics assignment for my master degree and I learned a lot by going through your lessons.
Thank you very much for providing such a fine work.
best regards,
Khalid
Khalid,
Glad to see that the website was helpful.
Charles
I need some help with using r to plot Kaplan Meier with date using my fisheries data. Thank you
Khyria,
I am not that familiar with R, and so can’t help you with that.
This website focuses on using Excel, especially using the Real Statistics package, to perform statistical analysis.
Charles
Great website. I’ve learned how to use Jenks Natural Breaks Optimization in order to cluster products by cost.
One question though, now I want to cluster products by cost and manufacturer. I’m struggling with a solution for this. Any suggestions?
Thanks!
Jack,
How do you plan to assign a value to manufacturer? These can’t simply be categorical values since you need to measure distances between the values.
Charles
Sir
I have questions,
i have read a journal about overlapping clustering, it uses kmeans algo.. It only uses maximum distance to identify overlap clusters generated by kmeans.
My question is, Is there a fix maximum distance allowed by K-means in assigning the data objects to a cluster? Or the maximum distance is the distance of object(measured distance of objects to its centroid) that were assigned in the cluster?
Thank you
If I remember correctly, the algorithm seeks to minimize the distance between any member in a cluster and its centroid, and to maximize the distance the centroid and points not in its cluster. I don’t believe there is a predefined maximum distance.
Charles
Sir,
1. I have applied Box Cox transformations as suggested by you but it didn’t worked. I am using 5 point likert scale in my research and majority of responses lies between 3 to 5. I used the value of λ as 1 as no clue was given in your literature. I was able to calculate the value of X & Y but the value of Z and r didn’t come when I have applied formula for Z as =NORM.S.INV((H$4-0.5)/H$203). My data started from cell H4 to H203.
2. What should the optimum sample size for using non-parametric tests. Do the 5 point likert test is less efficient than a likert test of say 10 or 11 points for normality check of the data.
Best regards.
Gulab Kaliramna
Gulab,
1. I don’t have any further advice, except to try other values for lambda.
2. 10 point Likert has better chances than 5 point Likert of satisfying normality. I haven’t come across many ways of calculating the minimum sample size for nonparametric tests. I know that when normality is satisfied then power of the Mann-Whitney test is about 94% of the power of the t test, everything else being equal. This means that the sample size required is only slightly higher than for the t test (when normality is satisfied).
Charles
Dear Dr. Charles Zaiontz,
1. I have collected data from 200 respondents on 5 point likert scale using Occupational role stress (ORS) by Udai Pareek but my data is not normal. I have used square root, inverse and double transformation method to normalize my data. Is there any other method to normalize the data.
2. I also want to know, is it compulsory to use parametric tests only to score good in research work.
Best regards,
Gulab Kaliramna
Gulab,
1. See Box Cox transformations at
Box Cox Transformations
2. If the test assumptions are not met for a parametric test, then it is perfectly ok to use a nonparametric test. The main downside is that the power of such a test will be lower, which may require a larger sample size.
Charles
Hello Dr. Zaiontz.
I’ll be quick, Your website is gold. Many thanks
Best Regards
Pablo Caballero
Thank you very much Pablo.
Charles
Dear Dr. Charles Zaiontz
This is just another thanks message. But I can not fail to publicly thank you for your work. I am sure that your life reflects the good that you have done to everyone with your work on this site. Have you visited Portugal yet? Please do so. We are people of good will, and we would like to welcome you.
Jorge,
Thank you very much for your kind words.
I have visited Portugal before, but this was many years ago. I enjoyed my visit and the people very much.
Charles
Hello Charles, i am following your website to get this valuable knowledge. the book you mentioned in the post is “Statistics using Excel @ Succinctly” or do you have some other book as well. i am inclined on your book because of excel use and we can do what we are reading. Do you have the sample excel files as well for your topics.
I have downloaded “Statistics using Excel @ Succinctly” and printing this.
Hello,
No. “Statistics using Excel @ Succinctly” is not the book mentioned in the post (although book was written by me). The book I am referring to in the post will much more detailed and should be coming out early in 2018.
Charles
We will be waiting…. I am not sure whether we would be able to have it / buy it in india soon. Please plan for the PDF format as well. Best wishes for the book.
There will be a pdf format that you can buy in India.
I have already finished writing the book and was hoping to publish the book this year, but I have struggled to find the time to finish proof-reading it and making any necessary revisions. In any case, it should be available in the first part of 2018.
Charles
Dear Dr. Zaintoz, do you also offer private consulting? I might need some explanation on how to run some tests…ICC and kappa and/or Fleiss
Please let me know.
PS. Non riesco a trovare su internet alcun video che spieghi come calcolare kappa oppure Fleiss con piu’ observers. Sono un po’ disperata. Sto provando ad usare Excel e spero ardentemente di non dover imparare SPSS.
Isabella,
I do offer private consulting. Please send me an email if you would like to pursue this.
Charles
Hi Charles!
First of all, thank you for the amazing website!
I want to test whether a noise signal is white or not, that is, I want to verify that the correlation between samples is null. I have the signal but don’t know what test to perform. What do you recommend?
Tiago Silva
Hi Tiago,
See the following webpage regarding how to test the correlation:
One sample hypothesis testing for correlation
Charles
Prof: Charles
Kindly I need
Bayes with t-test, ANOVA
Ahmed,
I don’t know what you mean by Bayes with t-test, ANOVA.
Charles
Prof. Charles
Plz check your email
Em badly stuck in my homework
Hi,
I just want to thank you for this outstanding website and information you have put time on. I am great full for the work you have put in this website which helped me a lot with my assignment, and i am sure it helped thousands others around the world as well.
David from Australia
Thanks for your kind remarks, David. I am very gratified when I see that people are getting value from the website and software. I hope to continue to expand the topics covered and to improve the learning experience.
Charles
The binomial distribution (BD) was created for determining P(k;n,p), the probability of k successes in a number n of independent trials with a constant probability p of success. . Given the (𝑛¦𝑘) BD term does not have practical importance. For example if you do four trails, it will be indifferent to obtain one success in the first, second, third or fourth. And even the products of probabilities are questionable.
The statistical models project (SMp) proposes the following expression: 𝑃(𝑘;𝑛,𝑝)=(𝑘∗𝑝+(𝑛−𝑘)∗(1−𝑝))/𝑛
which represents a weighted for p and (1-p), i.e. respectively probability of success or failure in each trial.
Is the SMp P(k;n,p) expression more appropriate than the current Binomial one?
Terman,
I have always thought that one of the advantages of the binomial distribution is that it is indifferent to which trials had the successes and which had the failures.
Can you give me an example where the order of the successes and failures would be important?
I don’t fully understand the meaning of the SMp expression for 𝑃(𝑘;𝑛,𝑝). Is this really supposed to be the probability of success on each trial as you have stated? I thought this was p.
Charles
Respected Dr. Charles Zaiontz
I hope i am glad and thankful to see that u solve statistical problem so nicely
Thanks alot for a great job.
Hi everyone
Can you please explain if one completes a paired t test and then wants to add an additional score if this can be done. If it can would a Bonferroni measurement need to be completed. I see that being a single person, under same conditions, with 2 scores one pre and post I can’t see a problem with adding it to the previous but would there be a problem and would one score really make a difference requiring a Bonferroni adjustment?
Rose, sure you can run another paired t test. But with only two pairs, don’t expect much from the test. Charles
Dear Charles:
I just have some question facing while I am working my research. I am working My paper with an effectiveness of early warning system so that am planning to measure dependent variable using four indicators to construct effectiveness index using PCA. The dependent variables are discrete with naturally ordered like effective, more effective, less effective or ineffective. The four variables used to measure early warning effectiveness are :
1. Household income,
2. Household asset
3. Frequency of information
4. Accuracy of information
So How can I create effectiveness index to have a cut-off?
Thanks.
Tadele,
Please see the following webpage:
Factor Analysis
Charles
Hi Dr Charles,
Your website is amazing for stats beginners like me. I am currently a bit confused as to which test I should use. I have 2 sets of experiments, each with a CV, one with 12.5% and the other with 8.2%. I would like to know if the 4% reduction in CV I made, whether is it statistically significant or not, should I use a Fisher’s F test or this test (https://real-statistics.com/students-t-distribution/coefficient-of-variation-testing/)?
Thank you so much!
Glenda
Glenda,
I am glad that you like the Real Statistics website,
You haven’t provided enough information for me to be able to answer your question. It sounds like a chi-square test of independence (or Fisher exact test), but I can’t say for sure.
Charles
Hi Charles,
Ah so sorry. So, I ran 2 sets of experiments, both with the same sample, one with a sampling size of 31, CV 12.5% and the other with a sampling size of 23, CV 8.2%. Please let me know if you need more information.
Thank you.
Glenda
Hi Charles
I’m trying to do a two way ANOVA but the levene’s homogenity test gives p<0.05 (violated)
How do i account for this? Is there a way i can lower my significance to 1% to adjust for the type 1 error with multiple comparisons?
Eze,
If Levene’s test is near .05, you can probably still use ANOVA, especially if you have a balanced model (all groups have the same size). Otherwise, I suggest that you explore using Welch’s ANOVA. See
Welch’s ANOVA
Charles
Thanks Charles
Very kind of you.
I’ve just noticed that my significance value is = 0.005
I have a balanced model
so how would i do a two way Welchs ANOVA is spss? will this make my overall significance rate measurable at <0.001
Thanks so much for your help
Eze,
Welch’s ANOVA supports only one/way ANOVA situations.
Charles
Hi Charles
Please would you be able to email me so we can discuss via phone? I believe you can see my email address
it’s the stats part for my dissertation which is really confusing me – I am also happy to pay for your lecture as well.
regards
Eze,
I suggest that you send me an Excel file with your data and the tests that you have already run. You can find my email address at
Contact Us.
Charles
Hi Dr. Zaiontz,
Thank you for developing a great Excel add-in for statistics.
I have been using it for 1 year, but after reformat my disc and reinstalling RS I get the following error in Excel 2013
“compilation error in hidden module”
Regards
Michel
Michel,
I don’t know what is causing this. Are you using some other add-in as well as Real Statistics?
Charles
Dear Dr.Zaintoz.
Thank you for this fabulous website.
May I ask you a question found on :
https://real-statistics.com/time-series-analysis/stochastic-processes/partial-autocorrelation-function/
I could not figure out how the ACVF are being calculated.
Are there excel formulae to arrive at figures 155314.1 121422.1 89240.19 without using functions?
Thank you!
Matthias,
This can be calculated via the Real Statistics ACVF function, which is described at:
https://real-statistics.com/time-series-analysis/stochastic-processes/autocorrelation-function/
This not a standard Excel function, but once you download the free Real-Statistics software you can use it like any other Excel function.
Charles
Dear Dr.Zaintoz.
Thank you for your reply !!
That is exactly the point. I wish not to use the ACVF function.
With the following provided by you:
“Note that ACF(R1, k) is equivalent to
=SUMPRODUCT(OFFSET(R1,0,0,COUNT(R1)-k)-AVERAGE(R1),OFFSET(R1,k,0,COUNT(R1)-k)-AVERAGE(R1))/DEVSQ(R1)”
I was able to calculate and fully grasp the concept of ACF.
I hope to be able to do the same for PACF.
Best regards,
Matthias
Dear Dr Charles
Our college uses four version of each exam in order to limit cheating between students, but the problem when diving the students into four group the reliability and difficulty of the hole exam will be affect. My question, Is there any method rejoined the four versions into a single one?
Assad,
I see the difficulty that you are trying to address, but I don’t understand your question. Are you trying to find a reliability index for all four exams together?
Charles
Hello,
Is it possible to add something to the real-statistics package?
I have been performing several Tukey HSD/Kramer Test and would like to make it less tedious. It would be helpful if the program performed all the comparisons (ie: 1 and -1) and listed them on the bottom, including the groups that were compared, and also highlighting the significant ones. It can get annoying to perform 10 comparisons when I have 5 groups.
Thank you.
Deval,
Thank you for your suggestion. I will consider making this change.
Charles
Hey Dr Zaiontz, first and foremost thx for your help on my last post. I’ve come across another issue i would like some help with if you can spare me some of your time.
I’m doing a paper on the circadian cicle and season on acute myocardial infarction and other acute coronarian syndromes.
Lets say we had out of 60 patients with acute coronatian syndromes, 23 on sumer, and 31 on the time frame in between 0h and 6am, how should i test the estatistical significance and, if possile, relative risk in this senario? How to acount for exposure time diferences between for instance those who had an episode in summer and those who had it in other seasons, considering they had 3 times the amount of exposure time?
Once again, thx in advance
Vitor,
I only partially understand the scenario you are describing (e.g. (1) 31 + 23 doesn’t add up to 60 and (2) why are you comparing times like summer with times like 0h to 6am?)
In any case, for most tests you will need to make sure that the dependent variable values are comparable, and so three times the exposure needs to be taken into account. If whatever you are measuring is uniform in time, then you might be able to simply divide these values by exposure time.
In any case, these are just ideas. I would need to understand your scenario better before giving any definitive advice.
Charles
Yes, so sry about It. Lets see, the first senario is as follow, 54 patients with acute episode, 17 in Winters, 15 Summer and 11 in both othe seasons. How to measure the RR ABS IC95% for the Winter ? Should i pair It with each season or relate It with the other Seasons average?
Vitor,
Thanks for the additional information, but I still don’t understand what you are trying to do.
Charles
Hi Charles
can you educate how to convert data for log transformation in the following case
1. negative data
2. Proportions data
3. percentages data
1. Let a be the value with the small value (i.e. the most negative value). Then use the transformation log(x-a+1) since x-a+1 > 0 for any x
2. You should be able use log(x)
3. Same as #3
Charles
Dear folks:
I am a university professor. Have a passion for statistics with experimental design orientation. I am writing a book of experimental design applied to environmental engineering. However, in the section of Time series analysis I have some applications of temporal autocorrelation using the Durbin-Watson tables. I would like to include the D-W tables in my book for publication purposes, but I need your kindly permission to do so. Could you help me with this issue?
Thanks
Hector A. Quevedo (Ph.D.)
P.D. I am a graduate from the University of Oklahoma. I am an American Citizen living in El Paso, Texas. I am working across the border.
Hector,
I don’t have any problem with you using the Durbin-Watson table on my website, but I have copied the values from the table that I have found in a number of other places on the web.
Charles
Hello, I’d like to ask a beginner’s question about multiple regression – I’d be incredibly grateful for your time. I’ve only recently learned the basics of linear regression and I still have the following nagging doubt.
I’d like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is ‘profit/loss’, which simply represents a sales figure for individual retail items. This is the variable I would like to forecast; (there are certain quantifiable conditions for each attempted sale of an item and these are my independent variables). My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a FIXED negative value of -100; (an item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items). A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):
23
55
201
-100
13
-100
321
124
57
-100
33
It’s my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I’d like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I’m wondering whether the fixed negative values might somehow pose a problem. I’m not trying to replicate the fixed -100 value for the losses, only trying to get to true averages so that I may accurately predict the profitability of an item’s listing for sale (and avoid unprofitable listings). Hope this all makes sense. Thank you very much.
I don’t see any problems with this fixed negative amount as long as it truly represents the Y value.
The real problem you have is that the data may not fit a linear regression model. You should graph the data and see if it is truly linear. You should also graph the residuals and make sure that they are randomly distributed.
Charles
Charles, thank you very much for your reply. I’ll be sure to check that the data meets the various requirements of a linear model.
Regarding your last point, the logic is that the residuals would be randomly distributed because the relationships between the variables remain constant, regardless of the profitability of a given listing. I will of course check the graphs, but it would help to know that I have the theory straight.
May I ask then, can I take it that if that an AVERAGE for Y in my case can be viewed in the same way as that for any appropriate independent variable, that duplicate/fixed values (in themselves) do not pose a problem in a linear regression analysis?
To clarify, let’s say there were no fixed loss amounts for Y in another case, that they were free to fall anywhere on the same continuous scale (as you usually find in any textbook example). Let’s also say that the average for Y in both cases is equal. Is it then safe to say that there is no apparent cause for concern with the data I have (assuming that it is appropriate for a linear regression analysis in every other way)?
Sorry if my limitations here are making things unclear or unnecessarily complicated! Thank you. Ben
I don’t see any particular problems with duplicate data provided the assumptions for regression are met and the negative value can be compared with the other values and is not a conventional value (like coding all missing data as -99).
Charles
Got it, thank you so much Charles. Congrats on the excellent resource pack and website here – well done! I hope you are well repaid for sharing your expertise. Ben
Great. Glad I could help.
Charles
Dear Dr.Charles
sir i want to you tell may some gaidence about how to writte research paper . pleas sir i have no any good teacher in provience blochitan in country pakistan.
There are many online guides to how to write a research paper, including the following:
www3.nd.edu/~pkamat/pdf/researchpaper.pdf
https://www.liebertpub.com/media/pdf/English-Research-Article-Writing-Guide.pdf
You can find more by googling “how to write a research paper”
Charles
Dear Dr.Zaintoz,
Please guide to interprete Tau observed and critical values of Augumented Dickey fuller test for stock market prediction
The variable considered is open
Thanks
Sneh Saini
See the following webpages
Dickey-Fuller
Augmented Dickey-Fuller
Charles
Hi, Dr. Charles Zaiontz,
Thanks for your excellent site and relentless effort.
I am working on a research using 4-point likert. I decided to use Chi Square to test the null hypothesis. What kind of test can i use for reliability? is Chi square enough?
In addition I have 100 respondents.
Thanks in anticipation
Hello Iro,
Chi-square is not usually considered to be a test for reliability.
Which test to use depends on what you mean by reliability. Please see the following webpage>
Reliability.
Charles
Thanks so much Dr Charles.
Dear Dr.Zaintoz,
I want to conduct experiments which will be done by human subjects. The outcome is explained by 10 user predictors and 6 task predictors.I want to calculate the sample size of users and how many tasks that each user should do to achieve a specific power .
Thanks in advance
Mushtaq
Mushtaq,
If you are planning to use multiple linear regression to do this, then I suggest that you look at the following webpage
Sample Size for Regression
Charles
Thank you Dr. Zaintoz for replying.
Does this give sample size of users and tasks separately? because the output variable depends on two sets of predictors: one from users and other from tasks.
Thanks,
Mushtaq
Sorry, but I don’t understand your question.
Charles
Dear Dr. Charles,
I mean, the sample size represents the overall observations that I have to get to achieve the requirements . But this number is a combination of number of tasks and users. For example, if sample size is 200, then I have 20 users by 10 tasks, 10 tasks by 20 users , or 40 users by 5 tasks and so on. This is the case if I want each user does the same tasks that done by other users .But, if each user does a different task, I think the sample size =number of users =number of tasks.
Thanks,
Mushtaq
Hello Charles!
I am a student at Uppsala University in Sweden, and I have unfortunately not done any statistics during my four years there, which I regret now when I’m doing my master thesis (earth science).
I have been trying to understand Time series analysis and PCA but it seems extremely complicated. I was just wondering if you could help me answer a basic question about it?
I have data of pore pressures at three different depths (in the ground), measured two times per day for about 5 years. The problem is that sometimes the measuring device stopped working so there are a lot of missing data, sometimes days, sometimes months. So my question is if it is even possible to make a time series analysis or a PCA with this kind of data?
Kind regards, Hanna Fritzson
Hanna,
It is often possible to handle missing data, but care needs to be taken. See the following webpage:
Handling Missing Data
Charles
Hi Dr. Raju,
Thank you for making this helpful Excel-Stats tool available for the public free of charge!
The Max-Diff analysis is becoming a popular analytical method for consumers’ preference (see “https://datagame.io/maxdiff-excel-modeling-template/”). Would you consider adding the Max-Diff analysis module to the current Real Statistics Resource Pack (release 4.9, as of 9/17/2016)? Thanks in advance for your consideration (or advice if such an analytical tool is available/accessible free of charge elsewhere).
Thanks,
Max
Hi Max,
I am about the issue the next release of the Real Statistics Resource Pack, but I will consider Max-Diff analysis for a future release.
Charles
Hi Charles,
Thank you so much for sharing the use of excel instead of SPSS! This is exactly what I was looking for!!!! I am very glad to find your website.
I am actually in the middle of completing a dissertation and struggling with analyzing the results I got from an online survey software which is very good and easy. However, I wanted to confirm if I am OK to use only excel for analyzing to complete the dissertation. Also if I should prove the reliability on each results I got from the survey.
Please let me know if you need more information to answer my questions.
I am very happy to have a conversation with you via email if you are free.
Thank you so much in advance!
Of course, my objective is to provide analysis capabilities in Excel that are accurate and just as good as those provided by tools such as SPSS. You should be aware that there is a bias against using Excel for statistical analysis. Some of this is based on some errors in earlier version of Excel and the lack of many of the mostly commonly used tools. Microsoft has corrected these errors in the last few releases of Excel and I have worked hard to add capabilities that are assessible from Excel but are missing from standard Excel.
Charles
Hi Sir Charles!
Thank you so much for the information about Kappa stat. It helped me a lot in my Research.
I’m actually conducting a research about the Effectiveness of a Diagnostic kit (local) compared to the commercial one using 60 samples. The commercial one is my Golden standard. And my adviser told me to use Kappa. However, I’m still in the process of absorbing the information/formula. Do you have any simpler formula for my problem?
I’m sorry for my demand but thank you in advance Charles! God bless!
Jehvee,
If your adviser wants you to use kappa, then you need to use the formula for kappa. This formula is not very complicated. See Cohen’s Kappa for details.
Charles
Hey Charles, I’ve come across another problem and was hopping you may be able to help me. I have a group of patients with an expected bleeding risk of 5, 48% in 3 months, and a thrombotic risk of around 10% over the same amount of time, those calculations came from big studies with over 3000 patients. Anyway, how can I compare those so see if this difference justify or not the use of an anticoagulant agent? Would Odds or Hazard ratio be useful in the scenario? To compare chances of 2 different events over the same sample? Thx in advance
Vitor,
It sounds like survival analysis (hazard ratio) might be the way to go. See the following webpage
Survival Analysis
Of course, you need to determine what sort of ratio is acceptable.
Charles
Dear Charles,
First, I would like to congratulate you for your website.
I wonder if you have examples of the sequential version of the statistical test of Mann-Kendall which detects change points in time series.
Dear Leonardo,
Thanks for your kind remarks about the website.
Unfortunately, I don’t yet support the Mann-Kendall test.
Charles
On another matter, I have an excel spreadsheet of 2600 university donor prospects with 20 potential predictor variables and am trying to predict those that have a higher likelihood of giving a gift of $10K+. I have experimented with the logistic regression tool and found it very difficult to use and interpret. I’m wondering if a simpler and equally effective solution would be to simply group the prospect list into groups representing all possible configurations of the predictor variables ( permutations?), compute the average number of $10K gifts given by each group historically, then rank the groups from highest to lowest. Those groups with the highest number of $10K+ gifts would receive priority in future fund raising. Does this make sense?
Tom,
It would seem that there would be a very high number of permutations of the 20 predictor variables. Even if each predictor variable was binary you would have 2^20 possibilities, which is a number larger than one million. With only 2600 donors most of the combinations would not have any representation. I can’t say that a logistic regression model would work any better.
Charles