Dr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).
Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.
Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.
He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.
He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.
So excited to see this. Just starting w/ it, after years on SPSS. You’re work is a gift. Thank you!
Thank you, Gary.
I hope you find it useful.
Charles
Hello Dear Dr. Charles,
I want to calculate aligned rank transform for my three independent variables and one dependent variable, I want to apply three-way factor analysis, I downloaded ARTool exe, but I get an error code of length less than zero parameter name: length. Can you help me calculate the aligned rank transform value?
Thanks so much.
Hello Dr. Zaiontz,
I am 51 years old and have a master’s degree in Applied Statistics from 23 years ago!! I never really had a chance to work in a related field but instead, I chose to enter into the world of business. About a year ago I had a chance to go back to my college notes pull my academic books out and find out what statistics is really all about! I have fallen in love with these concepts and the depth and perspective that they add to one’s views.
I appreciate very much the wealth of knowledge that you share on your site. I am working on a real project that someone posted and it involves fitting a Poisson model to the data. The numbers along the way however turn out to be much larger than Excel formulas can handle. Specifically, the Exp and Fact functions run into issues. What are your best suggestions for such scenarios? Would your software overcome this issue? I’m still in the process of learning R and SQL, would you say analysis in those environments would better handle the big number issue? Thank you so much again! Roya D.
Hello Roya,
1. To fit data to a Poisson distribution you need to estimate the lambda parameter. The MLE and Method of Moments estimate in this case is the average of the data elements. You don’t need Exp or Fact for this calculation.
2. Exp(x) works for x up to about the value x = 709.9. Fact(x) works for x up to 170. You can also use Gamma(x+1) or Exp(Gammaln(x+1)). This points the way for dealing with Fact(x) and Exp(x) for large x. Instead, if possible, deal with LN(x). E.g. the pdf of the Poisson distribution is f(k) = lambda^k * e^(-lambda) / k!. Thus, ln(f(k)) = (k*LN(lambda)-lambda)/GAMMALN(k+1). Note that GAMMALN(x) works for large x up to about 1E+305 and LN(lambda) works for large lambda, including the largest numbers supported by Excel.
Charles
Dr. Zaiontz,
Thank you again for this amazing resource pack. I have done cluster analysis with it and want to produce a graphical representation of the clusters. Would you happen to have any suggestions for me?
Thank you.
Hello Frank,
What do you have in mind for the graph? Are you looking for a plot of the centroids of each cluster with connection to each point in the cluster?
Charles
Yes, that is precisely what I am looking for. Thank you.
I found a way to visualize the clusters using a scatterplot. Thank you.
Frank,
That is good to hear. I was going to suggest an approach similar to that shown at
https://real-statistics.com/other-mathematical-topics/graph-theory/network-diagrams-in-excel/
Charle
Thank you. I will look into that one as well.
Good afternoon – It’s me again. I see that under your FAQ, you have provided how to increase the dialog box on a Mac. Do you have something similar for Windows?
Hi Frank,
The equivalent approach for Windows 11 is:
1. Open Settings
2. Click on System > Display
3. Increase the Scale setting to 175% (or whatever value you want).
Charles
Dear Charles,
I found your package is very powerful. I want to go deeper to know the details of your self-defined functions such as ForecastError(), ARMA_SSE(). What should I do?
You can enter the name of the function in the Search bar of the website. Alternatively, you can find the function in the list at
https://real-statistics.com/real-statistics-environment/real-statistics-time-series-analysis-functions/
and then click on the link for that function to get even more information.
You can get a full tutorial about time series analysis using Excel at
https://real-statistics.com/time-series-analysis/
Charles
Dear Professor,
I have a quick question. If every value is the same except 1 shouldn’t the Gwetsac2 be greater than .7?
Likert scale.
Amy,
My intuition says yes, but I would need to see the details.
Charles
Dear Charles:
What test should I use to determine differences between the following kinf of data:
All independent and assuming a normal distribution of data.
Age groups Surgery A (n frequency) Surgery B
0-5 200 100
6-10 300 200
11-15 500 300
16-20 200 150
21-25 160 100
I am not sure that Chi square is really capturing differences between surgeries for each age group.
Thank you!!!!!
etc
Chi-square is for difference in variances. For difference in means, use a t-test.
I found the formulas here: https://onlinestatbook.com/2/sampling_distributions/samplingdist_diff_means.html
Hi Daniela,
I am not sure what the best approach is, but here are my suggestions:
1. Use the two-sample Kolmogorov-Smirnov or Anderson-Darling tests
2. Convert the frequency tables into raw data. You could use the midpoints of each of the categories, but this would yield lots of ties. Maybe a better approach is to place the formula =5*RAND() in cell A1, highlight A1:A200, and press Ctrl-D. Then insert the formula =5+5*RAND() in cell A201, highlight A201:A500 (300 cells), and press Ctrl-D. Then insert 10+5*RAND() in cell A501, highlight A501:A1000, and press Ctrl-D. Etc. Use the same approach for the second sample in column B. Now, perform a two independent sample t-test (or Mann-Whitney test if the normality assumption doesn’t hold).
Charles
Hi Prof,
Good day to you, I have just installed the resouce pack for Mac 2011 but somehow encountered an error when trying to run the Binary Logistic and Profit Regression. I am currently using the M1 Macbook Pro.
The error message is as follows:
Compile error in hidden module: ‘LogisticRegression’. This error commonly occurs when code is incompatible with the version, platform, or architecture of this application.
May I know a workaround for this matter? Thank you!
Best Regards,
Aloysius
Hi Aloysius,
If you send me an Excel file with your data, I will try to figure what is causing this error.
Charles
Hi Charles,
I also have on my computer the Premium Solver 2022
Just after I have installed real statistic package I get a conflict with Premium Solver 2022
Error message is:
Quote
To guard against this possibility, you should avoid using any defined names beginning with “solver” in your own application.
UnQuoute
Please notice I have also installed traditionl Solver but this one had no conflict with Premium Solver 2022. The only newcomer is real statistic.
I uninstall real statistic according with the instructions but error persits.
The idea is that I want to keep real statistic package.
Thank you for your support,
Marian
Hi Marian,
Real Statistics does not have procedures that begin with the word Solver. The only functions which begin the word Solver are the Excel worksheet functions/procedures SolverReset, SolverAdd, SolverOK, SolverSolve, and SolverOptions.
Charles
Hello Charles,
Good day!
I need some clarifications. If I want to compare the GWA (General Weighted Average) of students (one group had Online Classes during the pandemic; the other group had Face-to-Face classes before the pandemic); can I use a t-test for independent samples? Take note that the subjects or courses taken by the 2 groups are different. For the Face-to-Face group, their GWA is based on the courses they took during their 1st and year years in college; while for the Online group, their GWA is based on the courses they took during their 3rd and 4th years in college. I do not have the grades in the individual courses, only the GWA is available from the records section.
Also, if one group is normally distributed and the other one is not, should I use the Mann-Whitney U test?
Thank you.
Florence
Hi Florence,
Yes, you can compare two independent groups using a t-test provided the samples are normally distributed. If one or both are not normally distributed you woukd generally use the Mann-Whitney U test.
The fact that the courses taken by the two groups are different needs to be stated in the hypothesis being tested.
Charles
hello mr charles zaiontz can i ask you about your nationality?
Hello Charloe,
I am American.
Charles
Dear mathematician. I did some research on weighted regression. Really important to our work. I would like to know your studies on this problem in analysis of variance. Thank you and wish you good health. Best regards.
See Weighted Regression.
Charles
I Agree
Hi Dr. Zaiontz —
Thanks for putting this amazing product online and share with the public. I really enjoy it. I learned a lot from you just in a few days of reading your materials. Some of them were confusing to me when I was in school but now they become much clear. Thanks a lot!!
Hello, Dr. Zaiontz,
Thank you for producing such an excellent product.
Best regards, Feliks
dear Charles,
your job is just… amazing!
ma mi pare di capire che ormai pratichi più l’italiano che l’inglese
🙂
hai fatto un lavoro incredibile…
Complimenti!!!
Thank you for your kind words.
I still communicate in writing mostly in English, but speak more Italian.
Charles
Good morninig, Dr. Zaiontz,
Fellow Boilermaker here. Your Real Statistics has been beneficial in my Data Analytics class. But I just ran into an issue with the Correlation Test. It ran fine the first time I used it, but now it takes a very long time to produce the results or becomes “Unresponsive.” So if you have time, I can send you the data to see what I am doing wrong.
Thank you for producing such a fantastic product.
Hello Frank,
Thank you for your kind words.
It is still nice to say that I am a Boilermaker even after all these years.
Yes, you can email me your data and I will try to figure out what is happening.
Charles
Hi Charles,
Our study is about using 2 different methods to evaluate and rank the performance of 29 companies. We have already accomplished the rankings of both methods. I thought that Kendall’s W is applicable to my study in knowing if the two methods are in agreement with each other with regards to the specific rankings they provided.
I have applied Kendall’s W to my study. The two methods being the raters and the 29 companies are the subjects being ranked. These companies are ranked from 1-29, from the best to the least best company. I would like to ask if Kendall’s W is appropriate and suitable for my study.
Even though I still don’t know if it is applicable to my study I have already tried getting Kendall’s W. The W that I got is 0.572167488 while the P-value is 0.272830274. Last question would be, what could be the interpretation of these results?
I hope you enlighten me with my concerns and I’m looking forward to hearing from you. Thank you so much! It would really mean a lot to me.
Hi Christine,
Based on the information that you have provided it does seem that Kendall’s W is appropriate.
Since p-value > alpha = .05, which supports the null hypothesis of no agreement, we conclude that it is likely that there isn’t agreement between the two approaches.
Charles
Hi, professor
I found this website is very helpful. A question about sample size and stepwise. I will survey a group of employees with and without a pet. Participants are asked to self-report if they have pets. So I will only know how many participants have and without pets. My research question is that what are variances accounted for by demographics ( education, gender, age, and income )predicting life satisfaction scores for employees with pets. I use G*Power 3.1, I entered two IV ( demographics and pet), the sample size is 67. Is this correct? Or, should I enter 5 IVs (education gender, age, income, and pet) to calculate sample size? Thank you very much for your help.
Hi Beverly,
Most likely, education, gender, age, income, and pet are your IV’s. Keep in mind that if some of these are categorical variables then you actually have more than 5 IVs. E.g. if age is coded as 0-20, 21-40, 41-60, 60+, then this results in 3 IVs (one less than the number of categories). See
Dummy Variables
Charles
Dear Doc, how are you?, and your family?, There is a new version?
Thanks
Hello Gerardo,
We are doing well. I hope that the same is true for you.
I am working on a new release, but it is not ready yet. I expect to complete it this month.
Charles
I just found out about your website. Thanks very much for the work you are doing! I have a degree in statistics (M.S. from the University of Minnesota) and worked for three years as a statistical consultant at the University of Victoria (2005 – 2008). As a consultant, I primarily used R, SPSS, and SAS. The software you have created would have been SO very helpful to me as a consultant and to my clients back then. I am glad that it is here now and can help people to understand statistics better and do quality data analysis. Thanks again, you are working on a truly great idea!!!
Nicholas Karlson (www.rcoding.org)
Hello Nicholas,
Thank you for your very kind remarks.
I welcome your support and would appreciate any suggestions that you have for how to improve the website and/or software.
Charles
Greetings Charles,
I will indeed be looking for ways to support your work at Real-Statistics. Part of my current job is to help graduate and undergraduate students adopt/use econometrics/statistics software. There are several courses and use-case scenarios that would significantly benefit from Real-Statistics. For example, I think Real-Statistics would be a great help to students studying AP Statistics.
Kind regards,
Nicholas
Hello Charles,
I make a point to visit your site atleast twice a month & end up learning and using your tool. In fact my learning is more effective by reading through the Basic Concepts on this site than the heavy books on stats! Not sure why your tool is not widely used at US universities and companies. But I did use the tool while doing a Masters program in OR last year and do use some of the MV analyses. Anyway wished to thank you and happy to see we have made through Covid.
Hello Sutanu,
Thank you for your very kind remarks. I sure hope we have made it through Covid, but, unfortunately, that may not be true for some people. So far so good for my family.
Charles
Dear Everyone: It would be nice to collaborate as a tem using advanced statistical methods.
How can we get in touch with other interested members and actually help build a community
I can see how great this tool is after a few moments of reading the reviews.
Regards
DMZ163
Dear Dr. Zaiontz,
Thanks to your excellent guidance and plug-in software, I feel confident that I don’t need to learn SPSS or R just to do some statistical tests with my data. I am currently writing a long book on publication and will be recommending your materials in it. Your explanations of the tests on the website are also really useful for someone like me that was never good at math.
Hi Bryan,
Glad I could help. Good luck on the book.
Charles
Hello Dr. Zaiontz,
Hope you are doing well. I am working on analyzing some data for a research project right now. I was researching online about data analysis techniques that led me to your website. The toolpak which you’ve created is proving to be extremely useful. However, I am wondering if the toolpak statistics options are affected by blanks (non-values) and if I should address them first. Any insight will be greatly appreciated.
Sincerely,
Suhas
It depends on the specific data analysis tool. Some accept blanks (using listwise deletion of missing data). Others will give an error message when there are blanks in the data. This should be stated in the documentation on the website.
Charles
Excellent job. You really make statistics digestible. I feel free now to recommend your site to my students without any fear they give up after the first bunch of math symbols.
Hello Marian,
Thank you very much for your words of encouragement. I have tried very hard to walk the thin line between too much mathematics and not enough. While I haven’t always succeeded I trust that I have succeeded just enough.
Charles
Hi Dr Charles,
I would like to ask is there any ways to include the data collection days into statistical analysis or test? I have collected the height of the plant for 14 days, and how can i include this variable into statistical test? The variable indicates the day i collected my data, like day 1, day 2, day 3, etc.
Thank you.
Cecilia,
You can do this with many of the analyses. Which analysis did you have in mind?
Charles
Hi Charles,
For Passing-Babok, the equation for c is sQRT(n*(n-1)*(2*n+5)/18)*z-crit. Can I ask if 5 and18 are unchanging or changes when the dataset grows?
Also, if the 95% confidence interval doesn’t include 0 for intercept and 1 for slope, how do I then correct this difference? Is there a factor that I can use to allow the 2 methods to become comparable?
Hello,
The 5 and 18 are unchanging.
I don’t understand your second paragraph.
Charles
Hi Charles,
Thank you for creating this website and the software.
I would like to know if I can apply ICC to test for test-retest reliability for my strain sensor. My strain sensor is attached to a concrete sample, and the sample is put under compressive load. I repeated the tests multiple times keeping everything constant, including the person observing the results (strain value from the sensor and compressive load on the sample). The only difference was the time when the test was conducted.
Could I apply ICC in this case to measure the reliability of my strain sensor?
Thank you.
Hello Delwyn,
The ICC can be used for test-retest reliability. This is described at
ICC for Test-Retest
I don’t know enough about your situation to say whether this the appropriate approach.
Charles
Hi Charles,
In your section “Confidence Interval for one sample Cohen’s d” in the calculation of standard error (se), the first term in the sqrt is given as 1/n. However, Hedges & Olkin (1985, p 86) show the quantity (n1+n2)/n1.n2, which simplifies to 2/n when n1=n2. I’m wondering if the former is a typo, or is there some other simplification for paired samples?
Thanks very much!
Hi Michael,
In the one-sample case, there aren’t n1 and n2 since there aren’t two samples, and so this can’t be the correct formula.
Actually, the one-sample case is more like a paired sample case where one of the samples has a constant value (usually 0).
Charles
Hello Charles,
Thank you so much for making this website and software.
I am trying to run binary logit regression model, is it possible to generate marginal effect by using the software? Please help.
Thank you.
I am pleased that you are getting value from the Real Statistics website and software.
Yes, the functionality you need can be obtained from the software.
Real Statistics’ Logit and Probit Regression data analysis tool will calculate the binary logistic regression coefficients based on your data. The marginal effects are obtained from these coefficients. For example, this is explained at
https://us.sagepub.com/sites/default/files/upm-assets/114728_book_item_114728.pdf
Charles
hi i need to learn this topic
holts forecasting method excel
can you give me a guide , where i can find similar subject to learn
See
Holt’s Linear Trend
Holt-Winter Multiplicative
Holt-Winter Additive
Charles
Dr. Charles,
Thanks for so much helpful tools. May I ask you a question about ADF?
When I did ADF unit root test for cointegration time series Y and X, I got the OLS equation(Y=a+bX) and the residual series(ERR) .
I ran ADF unit root test for this residual series(ERR) in level, the P value > 0.2; But if ran ADF unit root test for this residual series(ERR) in 1st difference, the P value <0.00001;
So can I say time series Y and X is cointegrated?
Thanks.
Mark,
If I understand your comment then the procedure you describe is almost correct. You need to use the Engle-Granger test as described at
https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/engle-granger-test/
Charles
Dear Dr.Charles,
Thanks for your reply. I learned a lot.
And as I further studied my data, one more question came out.
When Y, X, and residual series(ERR) are proved cointegrated by ADF, the Pearson correlation is only 0.6. I build another sample, it is still ADF cointegrated, but pearson correlation is -0.1.
So can I say Y and X are cointegrated? Or how should I identify the relationship between ADF and Pearson correlation?
Thanks very much.
Mark.
Thank you so much for this great program. It will help me a lot in my MBA Class.
Glad I could help.
Charles
Dr. Charles,
I can not thank you enough for your contribution with statistics with excel. I know you have several templets for various statistical analysis. I am wondering if you have any downloadable templet for “Bland-Altman”.
Thank you,
Regards
Mozammel
You can download a spreadsheet with the example presented on the website at
Bland-Altman
You need to download the Reliability examples workbook from
Examples Workbooks
Charles
Hello Dr. Zaiontz,
Thank you for sharing such a broad array of content on this site!
I am using your sample size calculator for logistic regression (with binary IVs). The example shown (and in the downloads) uses a two-level binary predictor (men vs women) with binary outcome (opioid Rx).
I realized after trying out the tool, that the resulting sample size is equivalent to sample size calculations used for a test of two proportions (in my world, typical for A/B tests). Makes sense.
Now, I am hoping to use this for a multivariate experiment with three predictors (each with three categorical levels) and one binary outcome. I already have a fractional
factorial design that I will be using (3^(3-1)) = 9 variants.
With more than one IV in such a model (3 in this case), each with three-levels, what adjustments to the sample size calculation would be needed? I’ve always assumed there should be sample size efficiencies when running multivariate tests over “one at a time.” Are any adjustments needed?
Thank you,
John
John,
When you say that you have a “multivariate experiment with three predictors (each with three categorical levels) and one binary outcome” are you still using a binary logistic regression model?
Charles
Yes. I plan to create either dummy coded variables for each factor (2 per factor for the three levels) or consider effects coding (-1, o, +1), but it is still a logistic regression problem, right?
John,
Yes, this is still logistic regression.
Charles
Sorry for the misnomer. I say “multivariate” to mean there are multiple variables in the model. It is a multiple logistic regression that I need to set up with appropriate sample size. There is one binary outcome, and the “three” predictors with three levels each means I need to create dummy variables for each of the three factors. So, for each three-level factor, I would create two dummy variables. That would mean a total of 6 IVs for the model, against the binary outcome variable. What adjustments, if any, would be needed to calculate an appropriate sample size for this model? Thanks.
John,
Thanks for the clarification. Sorry, but I don’t have a definitive answer for you. The following webpages might be helpful:
https://stats.stackexchange.com/questions/11724/minimum-number-of-observations-for-logistic-regression
https://stats.stackexchange.com/questions/384011/power-analysis-for-logistic-regression-with-dummy-independent-variables
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422534/
You could create a simulation that would enable you to estimate the power of a logistic regression model of the type you have in mind and then estimate the sample size by changing the sample size until you find the power that you desire. Not an easy process, but it should work.
I have done something similar for the Tukey HSD post-hoc test, as described at
https://www.real-statistics.com/one-way-analysis-of-variance-anova/power-tukey-hsd-test/
Charles
Dear Dr Zaiontz,
I am trying to follow your instructions on Building a Rasch model.
I am unsure how you generated figure 5. Do I need to create 13 more iterations of Figures 2-4 to reach the convergence?
https://www.real-statistics.com/reliability/item-response-theory/building-rasch-model/
Raihan,
Yes, you would need to perform 13 more iterations. The Rasch data analysis tool provided by the free Real Statistics software would do all of this automatically.
CHarles
Dear Charles,
thank you very much for your good recommendations. They work perfectly. Now, I only have to understand what I did and get.
Best wishes
Fritz
Dear Charles,
thank you very much for your support – it was already last year. Time goes by but I still fight with my statistical problems. My present problem is: I do a step-wise regression analysis with Yi = b0+b1X1+b2X2+b3X3. I monitor how R^2 increases from step to step. Now, I recognise, that the R^2 increment depends on the sequence in the second step which is either Yi = b0+b1X1+b2X2 or, alternatively, Yi = b0+b1X1+b3X3. In the next step, everything comes together again. The question behind is, however, whether X2 or X3 correlates better with Y1 identified by the R^2 steps. I wonder whether the reasons are the different correlations between X1 and X2 or X1 and X3, respectively. X1 and X2 do not show a distinct correlation, whereas X1 and X3 do. Maybe, you know a way out. Thank you very much. Fritz
Fritz,
If you remove one of the variables then R^2 will decrease. It doesn’t matter which variable you remove. See the following webpage
Stepwise Regression
Charles
Dear Dr Charles
A very interesting site. I am a Retired biomedical engineer helping a Cardiac surgical team to analyse their heart valve replacement patient data. One of the requirements is a Kaplan meier survival analysis. Since my need is only about once or twice a year, the regular statistical software are too expensive.
I was hoping to try out your Excel package – but am UNABLE TO DOWNLOAD it for some odd reason. The DOWNLOAD buttons are not working – but only opening a page again. Could you kindly help ? Would like to update my rusty knowlege by checking out your example workbooks also. Thanks in advance and looking forward
Hi Charles.
I couldn´t find the test for homogeneity of variances when more than 2 samples are in the analysis. Did I miss it?
I want to congratulate you for developing this great tool.
Wilfrido,
See Homogeneity of Variances
See, especially Levene’s Test
Charles
Great, I´ve got it.
I was thinking that test was for multiple contrasts of medians.
Many thanks!
Hi Dr. Charles Zaiontz! we are students. Actually we run two way manova in spss now we have to interpret the data. May you help us in interpretation of data.
What would you like to know?
Charles
Dear Dr. Zaiontz,
thank you so much for your extension pack and your excellent work! It helps me a lot at work and in my studies, also, the ressources and explenations you provide are very clear and easy to understand.
I wish you all the best,
Diana
Diana,
Thank you for your kind words.
Charles
Ho to find the standarize of the beta.imvers
To convert the quantile regression model
Soran,
Sorry, but I don’t understand your comment.
Charles
Dear Dr. Charles Zaiontz,
I am studying Principal component and Factor analysis and I need to understand how to perform these analyzes step by step, starting with the original data, ‘manually’ without any software. I already understood how to calculate the nxn (n> 5) matrices of correlation, covariance and their inverses, the row echelon form by Gaussian elimination, and the determinant, but I do not understand how to obtain, for example, eigenvalues. Please, can you indicate an article containing all the steps with examples?
Thank you very much,
Otávio
Otavio,
See Factor Analysis
Charles
I am in the process of writing a math paper for IB high school right now and really would like to know the best way to approach the raw data to determine whether or not it follows normal distribution. Just want to know what the procedure is, which tests I could use and how it would be done on real-statistics.com.
Background information
The raw data is the number of people into same sized intervals of 100mmr (matchmaking rank). It starts at 1100mmr but it should not matter as they can be put into individual ranks e.g. rank 1 would be people who are in between 1100mmr to 1200mmr rank 2 are people in-between 1200 to 1300 and so on. The raw data will have about 500,000 to 1,000,000 people in it that are separated into those intervals. (if needed i can send the raw data / grouped data)
If it’s not too much work, I just want to know what would be the procedure and tests needed to figure out whether this data follows normal distribution including showing a final graph of it. Also where I can find the procedure and tests as a guide on how it’s done.
Your website has helped me tremendously on other projects in the past.
Thanks,
Jesper
Hello Jesper,
You can use a chi-square goodness-of-fit test to test the fit of data to any distribution as explained at
https://www.real-statistics.com/chi-square-and-f-distributions/goodness-of-fit/
The following webpage specifically demonstrates this test for fitting data to a normal distribution:
https://www.real-statistics.com/non-parametric-tests/goodness-of-fit-tests/chi-square-goodness-of-fit-test/
More information about testing data for normality can be found at
https://www.real-statistics.com/tests-normality-and-symmetry/
Charles
Thank you so much!
Hi,
I have a data of daily minimum temperature. From the graph i can understand that it has seasonality pattern. I have found monthly average temperature and plot the graph. Can I use seasonality index and all to forecast. And also can I use temperature data for forecasting?
Manisha,
Yes, you should be able to use time series and regression approaches to make such forecasts. There are a number of techniques. See for examples:
Regression with Seasonality
Holt-Winters Mult
Holt-Winters Add
SARIMA
Charles
Hey Dr. Zaiontz,
I used the three axis method to look at whether I can see I have some statistical significance between some behaviour and daily rainfall/temp over a time series. I adopted your method, so my dependant being the date. I chose to look at a spearman and pearson in R. I had temp/rain fall as Y and date as X and then duration of activity as Z. Results show a R=0.68 and p=0.00013. So based on this I can see a reasonably strong correlation between temperature and lying time over the time serious with statistically significant results. Would you agree with this or am I barking up the wrong tree?
I have looked at the results and they haven’t changed from with the addition of the z axis, so this isn’t working. boooo!
Sorry Lou, but I am not following the argument sufficiently well to offer an opinion.
Charles
Hi
See my email
Dear Charles,
This is an inquiry about my last message about logistic regression.
I would appreciate your reply to my earlier message, but even independently of the question about the software, at least I want to know whether anything was wrong with my interaction model.
Thank you in advance for your kind response.
Best regards,
Masa,
I will answer your question shortly.
Charles
Just wanted to leave a note saying thank you for creating this — absolutely magnificent to have this to fall back on. Your work is much appreciated!!!
Thank you, Myles. I appreciate your comment.
Charles