Dr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).
Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.
Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.
He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.
He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.
Dear Charles,
Let me first thank you for your great website and the very useful software, which I have long appreciated very much.
Now I have been experiencing a difficulty in conducting a logistic regression analysis with the Real Statistic software, which is possibly a bug in the program.
When I choose an appropriate input range with summary data and click the “OK” button, a message saying “Input rage must have at least as many data rows as columns” appears.
This is understandable if the data is raw data. However, if it is summary data, columns can surely exceed the rows, for example when your model is interaction model and contains many product (interaction) terms.
In particular, my model was, using Real Statistic Function,
= LogitSelect (R1, “1, 2, 3, 4, 5, 6, 7, 1*7, 2*7, 3*7, 4*7, 5*7, 6*7”, True)
Is there something wrong with this interaction model?
I think this message should appear only when the data was raw data, and should not appear when it was summary data.
But there may be misunderstandings on my part.
In any case, I would appreciate your kind advice.
Best regards,
Masa
Hello Masa,
I have just issued a new release of the Real Statistics software, Rel 7.3.1, that eliminates the error message. You should now be able to use the logistic regression data analysis tool. I am not sure whether the model will converge to a solution, so I would appreciate your letting me know whether you did get a solution.
Charles
Hello Dr. Zaiontz,
I followed your Mann-Kendall Test instruction page (https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/) but I am having trouble figuring out the equation you used to calculate the ties corrections. Do you have any references I can look up? I have looked many places but have not found how to calculate the ties corrections. I am curious to know if the correction equation you used is specific to the data set and how I apply it to my own data set.
Thank you,
Hello Ricardo,
Here is a reference:
Gocic, M. and Trajkovic, S. (2012) Analysis of changes in meteorological variables using Mann-Kendall and Sen’s slope estimator statistical tests in Serbia. Elsevier
https://www.academia.edu/6955354/Trend_Analysis_MK_Sen_Slope
Charles
Hi Sir,
While searching “cox ph approach in excel” in the internet, I came across your example which computes the survival probabilities using cox ph partial likelihood method using excel. I have found it intuitive and really very helpful. I also validated it using R code (breslow approach under surv) and it matched.
I have a similar dataset, with one categorical causal variable only, “Product”. I created a dummy variable which takes 1 when product is of a particular type and 0 otherwise. Then I tried to use the similar approach in excel. However I didn’t get correct match while checking the result in R.
I have shared the dataset with you over email. Please could you advise me how could I handle this example.
Thank you
Thanks Professor, incredible website!
Thank you very much Dear Charles, best regards from Chile
Instead of clicking CTL+m and putting the information of my data rage in the pop-up window for the time series analyses, I put “SEN_SLOPE(my data range)”in an excel cell below of my data array and “MK_TEST(my data range) in another cell. I have 9,000 time series data and would like to get the Sen’s slope value and its significance(p-value) by dragging these two function columns. This method gave certain values but they seem to not be the correct Sen’s slope values or their p-values either. How can I get correct trend and its significance values for multiple cases in the excel using your software?
If you email me an Excel file with your data and results, I will check to see whether or not you have correct results.
Charles
I email you with my data file referring to your contact info. in this web site. Once again, i would like to calculate Sen’s slope and its p-value from MK Test for more than 9000 cases of time series data. Thus, if available, I want to drag the one excel cell with the function (Sen_Slope & MK_Test) from your software to apply for all. I will wait for your quick reply to my email.
Many thanks
Gwangyong Choi
Gentile professor Zaiontz, chi le scrive non è in grado di capire una sola parola si statistica.
Succede però che grazie alla sua pagina web, al suo pacchetto software e al suo splendido lavoro, sto realizzando in maniera autonoma l’analisi dei dati della mia tesi di master. Le assicuro che per me è una grande conquista e per averla resa possibile la ringrazio infinitamente.
Tempo fa ho trovato la pagina di Real Statistics per caso e ho realizzato alcune analisi per comprendere come funziona il software. Oggi ho ripreso in mano la tesi dopo quattro mesi di inattivitá e con gran sorpresa ho letto la sua bibliografía che prima avevo ignorato. Sono anch’io di Trento, per ragioni familiari vivo all’altro capo del mondo e mi fa grande piacere scoprire che ho scelto il suo lavoro per compiere il mio.
In questo messaggio non troverà argomenti di statistica ma semplicemente questo piccolo ringraziamento e i miei complimenti.
Mauro Brunelli
Ciao Mauro,
I am very pleased with your comment. I am very happy that I was able to help you.
Mi fa molto piacere il tuo commento. Sono molto felice di essere stato in grado di aiutarti.
Charles
Dear Charles,
I found a little mistake in Figure 2 – REGWQ test. As there is no Response section, I didn’t know where to put it. α(p) is not adjusted for the second stage, meaning V8 and W8. They should be 0.040204.
Jürgen
same is true for the results table when conducting the REGWQ test in Excel. I just wonder, if the calculation of q(crit) is correct, as it’s using α(p).
Jürgen
Dear Jürgen,
The formula in cell V8 is =IF(V6
Some parts of your reply are missing.
Jürgen
Hello Jürgen,
What parts are missing? Are you referring to the calculation of q(crit)? My previous response should also cover this.
Charles
Well I only see this as your reply:
Dear Jürgen,
The formula in cell V8 is =IF(V6
The formula in cell V8 is
=IF(V6
Nice life story Charles … do you have a Linkedin account?
Yes, I have a LinkedIn account
Charles
Hello, have a great day! I just want to ask about forecasting method. What could be the best method to use in the research paper if you have gathered annual data? Is the Holt’s Winters Method not applicable to it? Why? Hoping for your response. Thank you! 😇
Hello Jasmine,
This depends on the details about the annual data. Is there seasonality? Is there an upward or downward trend?
Charles
Dear Charles,
Firstly, thank you for your posts – they provide useful insight to statistics for an amateur such as myself.
I have a question of what test would be best used for my problem:
I have 46 patients and have identified different patient factors (e.g. age, gender, bone density, tissue density, etc) and each patient is administer ultrasound at increasing powers repeatedly with tissue temperatures recorded with each application of ultrasound until the therapeutic effect is achieved.
Unfortunately the ultrasound powers are not exactly the same for each patient, and some patients require more episodes of treatment to achieve the therapeutic effect.
Is there a way to assess which of the patient factors (e.g. age, gender, tissue density) has an effect on the power required to reach the resulting temperature on each application of ultrasound?
I thought a repeated measures ANOVA, but seek your advice to be confident I’m on the right track.
Regards,
David
Hello David,
If I understand correctly, age, gender, bone density, tissue density, etc. are the independent variables that you are interested in. The power required to reach the desired temperature appears to be the dependent variable. This looks like an application of regression. If you are also interested in the number of treatments required then you can use Poisson regression for this.
Charles
Just discovered this amazing tool…it’s just awesome what you have created here and all for free….it is soooo helpful!
So interesting
Hi Charles:
Your resource is great, but I am not sure how to carry out an equivalence test.
We are testing whether two dental procedures are equivalent (implants).
Thanks,
Jaime Núñez
The tolerance calculations were very helpful. How would you perform the calculation if your data isn’t normally distributed.
Hello Renee,
Currently, Real Statistics doesn’t support this capability, but I do plan to add it soon. In any case, here is a reference on how to make such calculations:
https://www.jstatsoft.org/article/view/v036i05/v36i05.pdf
Charles
Hi Charles,
I found your tutorial on how to apply cubic splines using Excel very useful, as it is advantageous to use Excel versus something like MatLab to perform these operations, especially due to accessibility and price.
https://www.real-statistics.com/other-mathematical-topics/spline-fitting-interpolation/
Would you happen to be able to publish an addendum to this tutorial that covers examples and applications of the smoothing cubic spline function that utilizes a weighting parameter? I believe Ridge regression is commonly used as an analogy here.
Thank you!
Hello Andrew,
I will look into this. Can you give me a source for this approach?
Charles
Outstanding!!
It would be beneficial to see Ridge analysis applied to smoothing the cubic spline method example, such that we could vary the smoothing parameter (penalty value) and see the impact that has on the fit. Furthermore, it would be great to be able to use the Ridge analysis to solve for the optimal smoothing parameter using cross validation.
As an overview, below is a hyperlink to Dr. Liang’s (from Duke University) statistics lecture on the topic.
https://stat.duke.edu/courses/Spring06/sta293.3/topic5/spline.pdf
Also, here is a hyperlink to a nice lecture on the smoothing spline approach:
https://www.stat.cmu.edu/~ryantibs/advmethods/notes/smoothspline.pdf
Thanks for sending this link to me.
Charles
Hello Andrew,
Thanks for sending me these links. I will look into this. Currently, Real Statistics has the following support for Ridge regression and spline interpolation:
Ridge Regression
Spline Fitting and Interpolation
Charles
Dear Mr Zaiontz,
many thanks for providing such a great statistical tool! It is a pleasure working with it!
I am trying to run a weighted linear regresssion (the explanatory variables are the dow jones returns (and 2 dummy variables, in oder to capture pre- and post-event returns and the dependent variable are the sugar returns). The weights have been assigned by using the reciprocal of the conditional variances that i have estimated using a GARCH(1,1).
Unfortunately, I constantly get the error message “division by 0” when I am trying to run the regression.
Could you please give an advice on what is possibly going wrong?
Thank you very much in advance! Hope to hear from you soon!
Kind regards,
Julia
Hello Julia,
If you email me an Excel file with your data and results, I will try to figure out what is going wrong.
Charles
Hi Charles,
I have data for trials conducted to evaluate five potato varieties across three sites over two seasons and decided to do pool analysis. I had done the homogeneity test for seasons and had no significant differences and I decided do pooled analysis. Is it right for me to do the pool analysis for the two factors (variety and site) if there is no significant differences in two seasons?
You assistance is very much needed.
Can you send me your email address? My email address is jonahanton986@gmail.com
Regards,
Jonah
Jonah,
I would need to know more about what hypothesis you are trying to test in order to answer your question.
You can find my email address at Contact Us
Charles
Charles
thank you for your respond. I will contact you through the email.
Regards,
Jonah
Thanks for your informative explanations
I have a question
Which statistical test should I use when the independence assumption is violated?
Many thanks
It depends on what hypothesis you are trying to test, but generally it is difficult to conduct a valid test if the independence sample assumple is violated.
Charles
My objective is to evaluate the significance of differences in robustness measure, which require a statistical test.
The robustness measure used as follows.
I am working with different linear regression models and many datasets.
First, I standardised all the variables (independent/dependent) to zero mean and unit variance.
Suppose I am working with the linear regression model. Then I performed 30-fold split for the dataset. So, I have coefficients for each fold. I calculated the variance for each variable within 30 models. Finally, I sum all the variances.
For example, I have 30 coefficients for a variable (X1), then I calculate the variance for 30 coefficients and the same for all the remaining variables and Finally, I sum all the variance in one total value.
I did this process with different models and datasets. So I end up with a matrix contains the Sum of variances values (its rows refer to linear models used and its columns for datasets used).
I need to use a statistical test to evaluate the significance of differences in robustness (sum of variance value).
Any suggested statistical test?
Your guidance is really appreciated!
Dr. Z,
Thank you so much for all your work in creating the valuable resource that is this website.
I am trying to convert 3 data points, namely the mode, 5th percentile and 95th percentile, into a Beta distribution. What is the most efficient way in Excel to obtain the Alpha and Beta from those 3 data points? Can it be done without an iterative process?
If you prefer, this question can be moved to one of the pages dealing with Beta distributions.
Many thanks,
DB
I suggest that you use Solver as follows:
1. Insert the values for the mode, 5th percentile and 95th percentile in cells A1, A2 and A3.
2. Insert the initial guesses (say 2 and 2) in cells A4 and A5
3. Insert the formulas for the mode, 5th percentile and 95th percentile based on the alpha and beta values in cells B1, B2 and B3. Namely, insert the following formulas in these cells: =(A4-1)/(A4+A5-2), =BETA.DIST(0.05,A4,A5,TRUE) and =BETA.DIST(0.95,A4,A5,TRUE)
4. Insert an error measurement in cell A6, namely the formula =SUMXMY2(A1:A3,B1:B3). This is the sum of the squared errors, the value we want to minimize.
5. Now select Solver from the Data ribbon. In the dialog box that appears, insert A6 in the Set Objective field, choose Min and insert the range A4:A5 in the By Changing Variable Cells field. After clicking on the Solve button, estimates for alpha and beta in cells A4 and A5 should be obtained.
Note: The formula in cell B1 for the mode is only applicable when alpha and beta are larger than one. The necessary modifications are not that difficult. Things are easier if you use the mean instead of the mode since the formula in cell B1 becomes =A4/(A4+A5) in all cases.
Charles
Thank you so much. I’ve used Solver for regressions before but never knew about the SUMXMY2 function which does away with helper columns.
I’m having some difficulty with the solution, and I think the issue lies in the difference between my raw data and the 0 to 1 scale. Maybe we need to solve for [A] and [B] as well?
What is the correct solution where the raw data is as follows:
mode = 1.00
5th percentile = 0.96
95th percentile = 1.08
Thanks again!
In the version of the beta distribution that I am using, the x values must be between 0 and 1, and so the 95th percentile can’t equal 1.08.
As explained at https://real-statistics.com/binomial-and-related-distributions/beta-distribution/ there is a 4-parameter version of the beta distribution where x takes values between a and b. In this case, you either need to specify a and b or supply more data so that these values can be estimated.
Charles
my questions is if i am doing a forecast for daily data and i have actual data for 4 previous years lets say 2019,2018,2017,2016
what is the year that i can start get forecast values for so that i can evaluate the model withe the error measurements ?
and thanks
I don’t know of a definite answer to your question; this is a judgement call. You can base the model on years 2016, 2017 and 2018 and check its accuracy based on 2019.
Charles
I just downloaded this add-in to Excel. I can’t thank you enough for this tool. This a phenomenal resource and you sir are the dude. The Dude abides.
Hello Sylvester,
This is the first time anyone has given me the Big Lebowski accolade. Thank you.
Charles
Dear Dr. Charles
i have done all the mathematical equation that is needed to forecast using SARIMA model and all things worked good for me but i need to ask you how i can calculate the mean percentage error “MAPE” for this method as it gives me the forecast for the next period that i don’t know the real “actual” data for it , but to calculate the MAPE to compare these method to other methods i need the forecast for these periods.
Can you help me please ?
See
Time Series Forecast Errors
Charles
thank you
but i already know the equations to calculate the MAPE
but the proplem is with the error
there is no forecast data for the period that have actual data
it just give me a forecast for the net periods that don’t have actual data
SARIMA method
Mohammad,
If you don’t have actual data, you won’t be able to calculate errors. Sometimes the model is based on part of the data and then the rest of the data is used to determine the quality of the model since in this case there is some actual data left and so errors can be calculated using the forecasted values vs actual data.
Charles
Dr. Zaiontz,
I can’t thank you enough for the work you’ve put into this site. I can never know effort you made over the years of learning and dedication it took to become an expert in a topic many, including myself, find extremely challenging, but to do all of it and then have such passion to share what you know and guide people along their own journey with statistics that you made (and maintain!) a resource like this site shows that you truly care about what you do, and I love that! It’s infectious! Thanks for getting me through some of the toughest classes in my undergrad and for giving me a passion for stats!
Thank you very much, Blake. I am very pleased that I could help you.
Charles
I want to thank all the people behind this website for straightforwardly explaining statistics and providing easy-to-follow examples using Excel.
Thank you, Nizar.
Charles
Hello Charles, I just wanted to say thank you for the tremendous website. The amount of analysis and work you have put into this site are amazing. In 1994 when I started an ISP (with 9600 baud modems) the one thing I hoped for most for the burgeoning Internet was that people would begin to communicate and share all manner of information, and they would do it readily and freely. That we could all learn from each other. For a while that idea held promise, but unfortunately not for long.
Your work and your website are truly examples of that original idea from so long ago. Your willingness to help are what could still form the backbone of the Internet. I was dubious and skeptical at first, but I was surprised and extremely gratified to find your site. I use your site regularly and you give me hope for the Internet. Please don’t stop doing what you are doing. Thank You Very Much, Rich Gibbons
Hello Richard,
Thank you very much for very kind remarks.
I understand very well where you are coming from. I worked with many of the people who were involved in the Internet from very early days and they had very high hopes for this new frontier, some of which were realized and some of which unfortunately were not.
Charles
Rich,
Though I haven’t started an ISP, I also use this site almost daily. Dr. Zaiontz’ posts, resources, and explanations helped break down the walls I had built around myself that said, “You’re not a numbers person”, “You’re just not good at math”, and “It’s too hard; just quit”, to the point where I went from not having taken a math course since 10th grade in high school to deciding to go for a M.S. in Data Analytics! I’m glad to hear others are getting as much out of it as I do, and I hope Dr. Zaiontz reads this and knows he has changed the course of my life because of his work here (and in all his other contributions, obviously!).
– Blake
Hello Charles,
Thanks for creating and posting all of this information! It was very useful and I’ve recommended the site to my students! People appreciate your work!
Paul
Dr. Zaiontz,
Please help. My dissertation is at a stand-still. I am scheduled to graduate in March 2020.
I intended to use chi-square (Fishers Exact) but was unable to obtain a high enough survey response rate, which yielded a 17% margin of error/confidence interval, at 95%CL. My committee insists I either resurvey or choose a different method due to the CI being so “high”.
I have: 9 IV, 1 DV. Total population:500. Survey sent to 119 based on (SRS) simple random sampling. 30 participants completed the survey(30 observations). Survey completion rate of 25.2%.
Am I able to conduct multiple regression instead with what I have? Do I meet the conditions/assumptions?
And if so, does multiple regression require I choose a CI, as well as a CL?
Thanks!!!!
Hello,
I really don-t have enough information to be able to give any advice.
Can you explain further what you were testing using Fisher’s exact test and what sort of results you got?
Charles
Hi. Yes.
Testing to see if age, race, gender, experience, political affiliation, and a few other variables have a statistically significant relationship to academic union support.
I had several expected values less than 5, so I used Fishers. 30 total observations.
For example:
Gender and Union Support: Fishers p value .230, fail to reject null hypothesis.
Is this what you need?
Hi,
Thanks for the clarification. You can use regression with age, race, gender, experience, political affiliation, etc. as independent variables and academic union support as the dependent variable. If this variable just takes two values, you should explore binary logistic regression. The results of this approach would tell you which of the factors are significant in predicting union support (with p-values and confidence intervals for each).These topics are explained on the Real Statistics website.
Charles
I can’t thank you enough for creating this set of tools. Do people really believe impoverished students can afford SPSS? You’re a life-saver. Now if I can figure out how to use the discriminant analysis tool…
Hi Charles
I have only three column in excel with Frequency, mileage[km] and censor or failure.
mileage[1600, 75, 3500, 5000]
Failure or Censor[F,F,F, C]
Frequency[1,1,1,54]
how can I perform weibull analysis in excel? appreciate that you post it to my email
See
Weibull Distribution
Survivability using Weibull Distribution
Charles
Logistic Regression not there Please Help
You can find the Binary Logistic and Probit Regression data analysis tool on the Reg tab.
Charles
I just want to thank you for providing such a powerful and useful addition to Excel. 🙂
Can I use Mann-Kendall test and Sen’s slope estimator to identify long-term (40-70 years) streamflow change trends and variability? Could you refer me some useful links and references on them please.
I don’t see any reason why you couldn’t use Mann-Kendall or Sen’s slope for a long-term trend.
I don’t have a specific reference for a long-term trend.
Charles
Hi Charles
you mann kandell see link https://real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/
if you have no values for your data points but counted those as blank how would the formula change. currently we have 12 months aginst those data points. if months 3 and 7 and 9 where 0 values.
1
2
3
4
5
6
7
8
9
10
11
12
If the values are truly zero, then you can use the test as described on the website. It the data is missing, then you need to do something special. See
https://www.researchgate.net/publication/265826436_Mann-Kendall_test_with_missing_data
https://www.researchgate.net/publication/259183853_Trend_Tests_in_Time_Series_with_Missing_Values_a_Case_Study_with_Imputation
There are other articles on this subject that you can find via google.
Charles
Hi Charles. Kandell SE formula re link https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/
Can you explain why we divide by 18..
Thanks
Hi Charles.
You’re article “Mann-Kendall Test” which is great, how could you work out the Tau values for that same data set. or do you have a article that explain in the same way, step by step on working out kandell tau values.
Glad that you got value from the article.
Which tau value are you referring to? Augmented Dickey-Fuller? Engle-Granger? Kendall’s tau?
Charles
Kendall tau. Sorry…need some more excellent work showing how this can be calculated.
See Kendall Tau
Charles
Dear Charles Zaiontz,
I need to calculate the sampling variance of Cohen’s d in case of a one sample t test and I found your post “Confidence Interval for one sample Cohen’s d” (link: https://real-statistics.com/students-t-distribution/one-sample-t-test/confidence-interval-one-sample-cohens-d/). In this post you refer to Hedges and Olkin (1985). My question is, did you find the formula of the sampling variance in the book of Hedges and Olkin (1985)? If you did, on what page can I find that formula?
Thank you in advance.
Jasmine
Hello Jasmine,
I don’t know the page number of this reference.
If I remember correctly, you can also find more details at the Lakens (2013) reference in the Bibliography.
Charles
Dear Charles,
thank you very much for your reply. I found some useful information in that article. There is just one question that I really like to ask:
so to calculate the sampling variance of Cohen’s d in case of a one sample t test (where I have one group with one measurement on a variable of interest) I can use this formula (1/n)+d2/(2*n), right? Where n represents the sample size and d the Cohen’s d. However, according to Borenstein (2009) this formula can be used to calculate the variance of paired groups (with two measurements of one group). In this case, the original formula is ((1/ni)+di2/(2*ni))*2*(1-r), where r represents the correlation between the two measurements on the variable of interest.
However, if I just want to calculate the variance of Cohen’s d in case of a one sample t test, then I should assume that the correlation r in that formula is equal to .5 (i.e., 2*(1-.5)=1), right? In this case, I get the first formula. This means that I assume that the correlation between the group and population(?) on the variable of interest is equal to .5? So my question is, is it justified to make such an assumption? And if so, is it perhaps an assumption that is too strong? And can I really use that formula to calculate the sampling variance of Cohen’s d in case of a one sample t test, or are there alternatives?
Thank you in advance.
Jasmine
Dear Charles,
Thank you very much for this website and for the Real Statistics Package for Excel. It is amazing and very useful. Excel is a powerful tool, but after this add-in is now even more useful and user-friendly for non-mathematics people.
I appreciate your work very much. Very helpful.
Thank you.
Laco
Dr Zaiontz:
I wanted to see if there was an appropriate citation for N=90 and 5 independ variables for the DW stat. my DW is about 2.2 and I would like to cite a source that would support no autocorrelation or values of the residuals as independent.
Thanks you Sir
Hello Jonathan,
You can use the citation described at the following webpage
Citation
Also see
https://web.stanford.edu/~clint/bench/dwcrit.htm
Charles
Good day sir,
How can I used Box plot in R-codes if my table are 3 x 3 contingency tables? Can you give me a data that are example in 3 x 3 contingency tables That are using R-codes in box plot.
Hi Elsa,
I don’t use R and so I don’t know what R code to use.
Charles
Hello sir, Can you give me a data of 3 x 3 tables or more than 3 x 3 .
Elsa,
You can put any positive numbers in the table. See
https://real-statistics.com/chi-square-and-f-distributions/independence-testing/
Charles
Sir How could I know If my data is symmetric?
Elsa,
If skewness = 0 (or is significantly close to zero based on the skewness test).
Skewness Test
Charles
Good day sir,
What is categorical data analysis?
Hi Elsa,
I believe that you are referring to those statistical analyses that are based on categorical variables (i.e. variables that take values that are not numeric). E.g. the chi-square test for independence is based on counts of categorical variables.
Charles
Sir do you know about Bowker’s test for symmetry? Such that Bowker’s test is a generalization of McNemar’s test.
Elsa,
Yes, this is a generalization of McNemar’s test. You can find more information about this test at
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_Multiple_Correlated_Proportions-McNemar-Bowker_Test_of_Symmetry.pdf
I expect to add this test to the website and software shortly, probably in the next release.
Charles
When is the next release of Bowker’s test sir?
Sir what is the detailed explaination about Technical details and the Test statistics? Why it goes in that Form?
Elsa,
I don’t have a date for the next release.
I don’t understand your other questions.
Charles
Sir the McNemar’s test have 3 assumptions then one of them the sample must be a random sample. Then, How to check for randomness?
Elsa,
Generally you dont test or randomness, but instead make sure that your sample is randomly selected. There are many tests for randomness< one which is shown at One sample Runs Test
Charles
Sir Can you give a data of 3 x 3 matched-pair table.
Elsa,
I dont know what a 3×3 matched pairs table is.
Charles
Sir good day! I need your APA Reference for your answer all my questions about what is test for symmetry. Please give me your APA reference sir.
Hello Elsa,
Which test for symmetry are you referring to?
I don’t know of any specific APA guidelines for testing symmetry. You can use the general APA guidelines however (as I have shown for a few of the other tests).
Charles
Excellent way to help all of us looking for easier stats. Simple examples and an add-in flawlessly working.
This is just to let you know I am very thankful!
Hi ,
I discovered this product on Youtube , found it amazing and now wants a piece of it. I am trying to download it but when I click in the download button, nothing happens. I am looking to use Logistic regression that I use very often. I also do not know what package to install. Plus I have excel 2016, is tat Okay?
Could you help, please?
Hi Karl,
I don’t know why you were unable to download the software. I suggest that you try again. You can press the button on the following webpage:
https://real-statistics.com/free-download/real-statistics-resource-pack/
You can also press the link labelled
Real Statistics Resource Pack for Excel 2010/2013/2016/2019/365
on that webpage.
Charles
Hi Charles,
Just want to thank you for transferring to all of us your knowledge
hello sir,
I hope that you are good please i want to know if you can create a simulation for me on excel for a fee of money please if you can just contact me on my e-mail
Hello Yaseen,
What sort of simulation are you looking for? If it is confidential, you can send me the details via email (see Contact Us),
Charles
Dear Sir
Thanks a lot for giving research scholars like me this wonderful software.
Please help me with performing iteratively reweighted least squares regression using this software.
Dear Rewa,
See https://real-statistics.com/multiple-regression/lad-regression/lad-regression-irls-method/
Charles
Thank you very much for your great effort, Dr. Charles Zaiontz.
This website is helping me with my theses.
I never imagine before that Excel could these statistic tests.
It is great as a statistic tool learning for me as well as simplified my problem because SPSS program is too heavy for my old laptop.
Regards,
Paramita
Hello Charles,
I using real stats for Excel 2013 on Windows and would appreciate if you could help me with the following. I am performing a MANOVA on a data set that is extremely similar to the one you used in the example using four types of soil and measuring yield, water requirement, and fertilizer requirement. You have a total of 32 measurements for each of the three dependent variables (eight for each of the four types of soil). Likewise, I have three independent variables, laser (8 subject), no laser (six subject), and control (21 subjects), for a total of 35 measurements on each of three dependent variables, acuity (A), contrast sensitivity (CS), and retinal thickness (RT). I proceeded by overwriting your example data with my, which simply added three rows. I then proceeded to change the formulae in cells F4 thru L7. That works fine. However I then tried to change the formulae in the SS CP AND GROUP COVARIANCE MATRICES. I received an error message reading “you cannot change part of an array”.
I MANOVA closer resembles your example and I would like to utilize all of the formatting you have done without completely rewriting all of the formulae. How can I do this?
Thanks very much.
Joel joelmweinstein@me.com
PS I’m not very familiar with your website layout. Where will your reply be posted? Would appreciate it if you could send a copy to my email address.
Joel,
You should be able to modify parts of the output produced by Real Statistics. However, if need to change a few of the cells produced by an array formula, then you will need to be a little clever since you cant modify cells with a range output from an array formula. This an Excel restriction. See
Array Formulas and Functions regarding the error message you are receiving.
Suppose that the range A1:B5 contains an array formula and you want to modify the output in cell B2. One way to accomplish this is to place the formula =A1 in cell D1, highlight range D1:E5 and then press Ctrl-R and Ctrl-D. Mow range D1:E5 will contain the same results as A1:B5, but whereas you couldnt change cell B2, you can change cell E2.
Note too you can write your own VBA formulas using calls to the Real Statistics functions, including array functions. This is explained at
Calling Real Statistics Function in VBA
Charles
Dr. Charles Zaiontz,
I want to estimate the translog production function by using the method of ridge regression as my data has a multicollinearity issue. I also tried step by step on the data you have uploaded on the site, but something is going wrong as I did not find the command (i-e. DIAG) in the excel sheet. Now, I can’t go for the remaining work. Therefore, I need your rich and timely assistance in this regard.
Thanks a million.
Waqar,
DIAG is a Real Statistics function. You need to download the Real Statistics software to use it. It is free.
Charles
can you tell me if there an excel function witch can regress a polynom
See Polynomial Regression
Charles
Hello Dr. Zaiontz,
I stumbled upon your website and saw an example of the Hodges–Lehmann estimator. But it was calculated in the context of a wider problem, and not what I was looking for.
Can one construct a formula in excel, dedicated to output the Hodges–Lehmann estimator for a given series/array of numbers?
Thank you for your advice,
Orion
Orion,
In https://real-statistics.com/non-parametric-tests/wilcoxon-signed-ranks-test/signed-ranks-median-confidence-interval/ I show how to calculate the Hodges-Lehmann estimation for the median. Is this what you are looking for?
Charles
hello, Dr. Zaiontz,
Here I am looking for your help again. I collected some writing samples by 30 Chinese students. To be exact, the 30 students each wrote a dissertation in English and wrote a research article in Chinese. I intend to see if there is any difference between their English dissertations and their corresponding Chinese research articles in terms of hedges. What statistical method should I use for this end? Or is it possible to do any statistical analysis with these data? Thank you so much for your time and help! Best wishes.
May,
A paired t test might work.
Charles
Thank you for your prompt reply, Dr. Zaiontz. I’ll try.
It’s likely possible to have an overlap classification data at the end when using LDA. Can we set a threshold in discriminant analysis to provide more separation class data points? if so, how?. Regards.
Fergo,
The point of LDA is to determine a specific category. Since the outcome are weights, I guess you can interpret the existence of overlap categories, but I am not surre what purpose this would serve. When you say that you are seeking more separation of the data points, what do you mean?
Charles
i want to test a data vector whether it belongs to category A or B. So, I run several inputs data vector that i know it belongs to category A, but the outputs show some of them are wrongly categorized as B. I meant to seek a way to get a better outputs, maybe by applying a threshold or something so at least i can reduce the errors. Would you like to share some ideas please?
Fergo,
This sounds like something you can do with neural networks, training the network based on the data you have. Just an idea.
Charles