Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here
Hi Charles,
I’m just getting my head around cubic splines, and as expected your wonderful site carries a really clear explanation of how they work and how to create them using Excel. While going through the worked example you give here
https://www.real-statistics.com/other-mathematical-topics/spline-fitting-interpolation/
you fit a cubic spline to ln(x) – or at least, that’s what it says. However, as far as I can see that can’t be the function; it must at least be some more complex function involving ln(x); it doesn’t appear to be a simple log function to another base. Perhaps you could add a line to say what it is ? As ever, thank so much for this incredibly useful site. Have a great day. Robert (Oxford UK)
Hello Robert,
Thanks for catching this error. The function is h(x) = ln(x+1). I have made the change on the webpage.
I appreciate your help in improving the accuracy and reliability of the website.
Charles
No problem ! Delighted to be able to help out even in so small a fashion. Best wishes, Robert
Hi Charles, thank you for sharing this tool with us, it is amazing!
I usually use 2-way (or more ANOVA) followed by Tukey or Fisher.
Is it possible to run Fisher?
Thank you!
Real Statistics currently does not support Fisher’s LSD post-hoc test since it doesn’t correct for experiment-wise error.
Charles
Dear Charles, I recently started using you excel add in and thank you so much for developing it and that for free. You are amazing.
While using the add-in I was trying to understand how I can use for this situation but could not figure it out:
Lets say I have three variables in each column: name, gender and age (in stacked column format, so i have one row of data for each unique value: e.g Dave, Male, 23 and then next column Sara, Female, 35 etc),
I want to compare if the mean of age of men and women are different in my data. How can i do that using your add-in. Can you please help?
Thank you and wishes for 2021.
Kumar
Kumar,
If I understand correctly, you want to use a t-test to compare the mean ages of men and women. Let’s suppose that your data is in range A1:C100 (without headings for the three columns). The first column is not used for the t-test. You can use the Real Statistics T-Test and Non-parametric Equivalents data analysis tool using B1:C100 as the Input Range and selecting the following options: T-test, Two independent samples and, most importantly, the Use standard (stacked) for two sample test option.
Charles
Hello Charles, thank much. I tried the steps you suggested. I selected the two columns: one with variable Sex, that has exactly two categories “male” and “female” and the other variable “age”. I selected the two columns as input 1, left input 2 blank, selected use standard (stacked). But i get the error “the first column of data in standard format must have exactly two unique entries”. My sex variable has only “male and Female”. I even converted the male female to 1 and 2 and repeated the steps but I am getting the same error. Where do you think am making the mistake.
Thanks again for your help.
Pls ignore, it worked. I ran it again without the variable name and it worked. Question: I see cohens d is reported. Its blank for the two variable Female and Male but reported for “pooled”. How should we interpret that. say if its .25. also what is “effect r”
Also I see no lower and upper values in one tail test. That’s normal?
Pls ignore if my questions re silly. Just getting started 🙂
Thanks!
Cohen’s d and r are effect sizes that measure how large the difference between the female and male ages in the sample is. You can interpret Cohen’s d as described at https://www.real-statistics.com/sampling-distributions/standardized-effect-size/
r is interpreted as described at https://www.real-statistics.com/correlation/dichotomous-variables-t-test/
Charles
Dear Mr. Zaiontz,
I recently had a paper rejected and one of the reviewers main arguments was faulty statistics. In statistics forums, I could not get a straight answer from anyone. So, if you could provide some advice and input, that would be really great.
Two (2) groups of schizophrenia patients were evaluated with clinical scales consisting of patient responses and clinician assessments. These are ordinal ratings, from 1-7. Scales totaled between 8 and 30 indices, based on the test.
Each question/index was compared between the two groups. I initially opted to use a Mann-Whitney test, but the scoring distributions were of completely different shapes which invalidated proper interpretation of the results. No statistical measure of shape was needed; their distributions were like night and day, either a bell curve or a ski slope.
I decided instead on building contingency tables and performing a simple chi square analysis to compare distributions. Reviewers did not like this. What would have chosen?
I also read that if you have sufficient data points, even with categorical data, parametric tests such as a t-test with Welch correction are appropriate to use. Is this so?
Any input you or others on the site can provide is greatly appreciated.
Best regards,
Michael from Tzfat
Michael,
1. Can you clarify what you mean by the following?
“Two (2) groups of schizophrenia patients were evaluated with clinical scales consisting of patient responses and clinician assessments. These are ordinal ratings, from 1-7. Scales totaled between 8 and 30 indices, based on the test.”
2. What did you do to translate data appropriate for a Mann-Whitney test into the form for a chi-square test based on a contingency table?
3. Even when the distributions have completely different shapes you can use the Mann-Whitney test. The test doesn’t test the null hypothesis that the two medians are the same. Instead, it tests a different hypothesis as explained on the Mann-Whitney test webpage on the Real
Statistics website.
Charles
Hi Charles,
Thanks for your reply. I will try to fill some details.
(1) We evaluated two groups of patients with several clinical scales (PSRS, PANSS, CGI, and QLES18). These are Likert-like scales with ordinal ratings for each index. For example, the PANSS has 30 indices (divided into 7 positive, 7 negative, and 16 general psychopathy), each rated on a scale from 1-7 with 7 as most severe. There were 100 patients in each group (100 with delusions only and 100 with auditory hallucinations).
(2) For the chi square, I built contingency tables. For example, question P3 for patient group 1: number of patients rated 1, number of patients rated 2, etc. So, each table was at maximum 2×7 with 2 patient groups and 7 ratings. In cases where there were tables with >20% of cells with a value of 1, adjacent cells were combined to fulfill the requirements for table analysis.
(3) According to Motulsky, if you do a Mann-Whitney on datasets with different shaped distributions, the results cannot be properly interpreted, even with significant outcomes. I realize that the test compares ranks and not medians, but this is what he recommends.
Anyway, I thought that working a contingency table to compare distributions with a chi square test was a valid approach to analysis. Would Mann-Whitney work better? Is there a way to analyze all of the data for a scale together instead of a question-by-question approach?
Michael,
1. You have two groups but multiple scales. Did you try to compare the two groups for each scale using a separate test for each scale or did you do something else?
2, Does this mean that you used a separate contingency table (and chi-square test) for each question in scale? The usual chi-square test doesn’t take the ordinal aspects of the scores into account. You could use the nominal-ordinal version of the test instead. See
https://www.real-statistics.com/one-way-analysis-of-variance-anova/nominal-ordinal-chi-square-test/
3. You can use the Mann-Whitney test to test the following null hypothesis even when the two distributions have different shapes:
H0: if one observation is made at random from each population (call them x0 and y0), then the probability that x0 > y0 is the same as the probability that x0 < y0 If, in addition, the populations for each sample have the same shape, then this would mean that the two populations have the same median. Charles
Dear Mr. Charles,
I have written a paper in which I have tested the effect of sugar concentration and type on volume of co2 released by yeast. Each sugar type was tested across the same five concentrations. There were three sugar types tested, resulting in 15 experimental conditions.
I kindly ask for your advice for a statistical method of determining data significance and approving/disproving experimental hypothesis. I am quite lost at the moment as no test seems to suit 2IV one categorical and one continous.
Mara,
In order to give you a proper response, I would need more details about the scenario and the hypothesis or hypotheses that you want to test.
Charles
Hi Charles
Many thanks for creating such a useful service. Pease could I ask your advise? Everytime I want to use Real Statistics i have to untick the add-in then re-tick the addin. When I next open Excel I have to repeat the process evertime! This has only recently started happening.
Many thanks
KR
David
Hello David,
I don’t know why this would happen. Which version of Excel are you using? What happens if you don’t untick the addin when you open Excel?
Charles
David,
This has been happening with me & my students for years. I don’t know why Excel forgets that it’s linked to Real-Stats when it’s closed down & re-opened, but most of the time it does. I’ve taught them how to uncheck & re-check so it works ok.
The other thing I see is that once in a while Excel unchecks the Solver when it’s closed down & re-started, which results either in a request for a password, or weird error messages (e.g., compile error). Again, once I played around I figured out what to do: go into add-ins, uncheck Real-Stats & re-check the solver, & click ok. Go back into add-ins & this time check Real-Stats again, & click ok. It doesn’t seem to work if you try to do it in one step; the solver usually has to be already in use before you can click Real-Stats.
Dear Mr. Zaiontz,
Thank you very much for all the information on your blog! I just had a quick question which is confusing me as I write my paper. Can you use a chi-square test of independence for both nominal and ordinal variables? It seems that some say yes, some say no. Is it okay if I regard response options such as education levels or ‘oppose~neutral~approve’ as nominal variables instead of ordinal and run the chi-square test? From what I understand, that means I won’t take into account the order of the ordinal variables but would that be a major problem statistically speaking? Thank you so much!
Kate,
You can certainly use the chi-square test even when the categories are ordered, but, as you have stated, the order won’t be taken into account. Sometimes this will result in drawing the wrong conclusions. Real Statistics does describe a few ways of taking order into account. See the following webpages:
https://www.real-statistics.com/correlation/ordered-chi-square-independence-test/
https://www.real-statistics.com/one-way-analysis-of-variance-anova/nominal-ordinal-chi-square-test/
Charles
Hi Charles
Where do I find ARCH AND GARCH
Real Statistics doesn’t yet support ARCH and GARCH, I will eventually add these.
Charles
Hi Charles,
I have two installations of Real Statistics on Office 2019 (64bit) and Office 2016 (32bit) and both have an issue with Excel not responding on opening the file where Real Statistics calculations are present, taking between 2 and 5 minutes to open. I did some research on my 2016 machine and disabled the printer as I could see spl (spool) service was holiding it up. It did improve after disabling. The addin is in the recommeded location, unblocked, and in a trusted location.
I wonder if you have come accross this situation before? I have another colleague with an Office 2019 installation where the files open with no issue. But these two machines open okay until you go to edit the file.
Thank you
Kevin,
It really depends on what is on the file. Some of the simulations can take a long time. E.g. a couple of the Real Statistics examples workbooks take a few minutes to open.
This should not happen when you open a blank Excel file or open most files even with lots of calculations. Only certain calculations take more time.
Charles
Dear Charles,
I’m a student Biomedical Sciences from Belgium, Leuven. Firstly, I think it’s great you put all of this information and help online for free for students like me. But I actually have a question: I installed the XRealStats add-in for excel for statistical analysis of a small survey (Fisher test). I did 9 tests from which 4 times a p-value of 1 was given. Is there an explanation such a ‘weird’ p-value was calculated 4 times? I used a 2X2 contingency table with in total 27 observations, so maybe the small number of cases has something to do with it?
Anyway, hope to hear back from you.
Greatings from Belgium.
Harry,
A p-value of 1 is certainly possible. If you like you can email me an Excel spreadsheet with your data; I will check to see whether there is a problem.
Charles
Dear Charles,
Congratulations for you work !!
I’m training to find the GINI coefficient calculation / Lorenz curve into the excel complement Real Statistics. Could you help me?
Thanks in advance
See
Gini
Lorenz curve
Charles
I am a researcher in New Zealand. Just wanna say thank you very much for creating such a website and sharing so much useful information for free. This website has helped me to solve an important question related to my thesis and has saved me a lot of time and energy! Thank you so much! I will give you 10 out of 10!
Hi Klay,
I am pleased that the website has been helpful to you.
Charles
Dear Charles,
If you can help me with this, you are a life saver. I am currently pursing my masters, for my thesis I have covered the topic of “Impact of Strategy Implementation on Organizational Performance” in my questionnaire I have focused on 3 primary sub-factors (Resource Allocation, Leadership and Monitoring and Evaluation) I have to use Multiple Regression and Pearson’s correlation analysis to analyze my data. I have approximately 10 questions in each sub-factors which was answered using Likert Scale while my organizational performance was only 1 question. How can I analyze this data? How do I get my x and y values as there are 10 questions for all subfactors but only 1 for Organizational Performance?
DJ,
For the purpose of my response, I will assume that you are interested in understanding the relationship between the 3 factors and organizational performance. The 3 factors then become the x variables and organizational performance becomes the y variable.
If you can map the 10 questions to the 3 factors, then you just need to add the scores for the questions related to each factor to obtain the values of the three x variables (You can use the sums of the averages, the analysis will be the same).
If you can’t map the questions clearly to the questions, then things become a bit more complicated.
It does not matter if, for example, Resource Allocation, Leadership, Monitoring and Evaluation, Organizational Performance are mapped to 2, 5, 3, 1 questions, respectively.
Once you have the mappings you can use the multiple regression data analysis tool, although you might need to use a different type of regression (probably multinomial logistic regression) if organizational performance is measured with a Likert scale.
Charles
Dear Charles,
Thanks a lot! Yes, the relationship is precisely what I’m trying to understand.
As suggested by you, I will use the sum of all responses (example are 10 questions answered by 183 people in Likert Scale) so I add all the scores together to use as my X1 (Resource Allocation) X2 (Leadership) and X3 (Monitoring and Evaluation) and use those as my X values and my organizational performance as Y?
Once done with this, as my organizational value is also done using Likert scale I shall try using both Multiple regression or multi nominal regression to get my other values for person coefficient.
Please let me know if I’ve understood your suggestion correctly. Thank you so much!
DJ,
You understand this perfectly. Good luck in getting it all to work properly.
Charles
Dear Dr. Charles,
I used your Real Statistics package and very impressed with it. Now I use it for my research.
I plotted the 95% confidence ellipse using the analysis tool and excel calculations, the values of the long and short axis were slightly different. I would appreciate it if you could let me know if there is a cause for this.
Best regards,
Ryo
Ryo,
If you email me an email with your data and results, I will try to figure out why this is.
Charles
Dear Charles,
In “Ridge Regression Predictions” where “MSE = SSE/(n-k-1) = 554.91/(18-4-1) = 42.68541 (cell H8)”, do you mean that k is the # of independent var? (We have x1, x2, x3, and x4, thus, k = 4.)
Plz kindly correct me if I’m wrong. Normally, MSE = SSE/n; n is # of observations. How come in Ridge Regression / LASSO Regression, we have MSE = SSE/(n-k-1) instead?
Best regards,
Kelly
Hello Kelly,
Yes k = the number of independent variables
MSE = SSE/dfE where dfE = the degrees of freedom for the error term. Usually, dfE is not n. For regression it is n-k-1. See
https://www.real-statistics.com/multiple-regression/ridge-and-lasso-regression/ridge-regression-example/
Charles
Dear Charles,
Omg, thank you sooooo much for your clarification. MSE = SSE/dfE for regression.
If I have to compare the results from Forward-Stepwise Regression vs the results from LASSO Regression using the same dataset, should I plug in MSE = SSE/(N-k-1) for both Forward-Stepwise Regression and LASSO Regression?
Best regards,
Kelly
Hi Charles!
I wanted to point out an error on https://www.real-statistics.com/non-parametric-tests/goodness-of-fit-tests/anderson-darling-test/anderson-darling-distribution/ . For the range 0 < A < 2 you've listed the p-value approximation as A^(-1/2)*EXP(-1.2337141)*u, but the correct approximation is actually A^(-1/2)*EXP(-1.2337141/A)*u. For a reference, take a look at page 3 of this paper: https://www.jstatsoft.org/article/view/v009i02 .
Just wanted to help contribute a bit, and say a big thank you for putting all of this information together in an easy-to-read format. Your site has helped a ton with a statistics project I'm working on!
Best,
Paul
Paul,
Thank you very much for catching this error. I have just made the correction to the website.
Fortunately, the correct formula was used in the AD_DIST function.
I appreciate your assistance in improving the website and I am pleased that you have found the site useful and easy-to-read.
Charles
Dear Charles,
Good morning.
I am conducting my first research on digital competence for students, and I have built a questionnaire of 5 constructs. I understand that I will have to apply Chronbach´s coef. within each of the dimensions (each construct).
My doubt is if the number of items (they are all Likert scale of 7) within each dimension has to be even (2, 4, 6 …) or not necessarily, and if the number of items has to be same within each dimension.
Thank you very much. Found this site and was very inspired to share my doubts.
Alice,
The number of items can be odd or even to use Cronbach’s alpha.
Charles
Hi Charles
As a budding strategy performance/risk analyst I find your blog to be my go-to bible.
It would be excellent if you had a github repository for all of your macros. This would allow to read your code, learn from it, and also save it in my work environment – which doesn’t allow downloading from the internet.
I’m struggling finding an ARIMA VBA example out there. Do you have one?
Kind regards
Nikolas
Nikolas,
No, I have not included any VBA examples.
Charles
Dear Dr. Zaiontz.
Do you have some developments on redundancy analysis (RDA, Wollenberg 1977)? It’s also known as reduced rank regression (RRR, Izenman).
Best regards, JT
Hello Juan,
Sorry, but Real Statistics doesn’t support reduced rank regression at this time.
Charles
Hello Dr. Charles,
I think that your work is fantastic but a think that de functio eVECTORS is presenting the wrong resultt for the matrix below, wich is a symetric matrix that should have a full set of ortogonal eingenvector, with real eigenvalues:
8 0 6
0 4 0
6 0 4
the result presented was:
12,32 -8,00 -0,32
0,81 0,00 0,58
0,00 -1,00 0,00
0,58 0,00 -0,81
0,00 1872,00 0,00
but -8 is clearly not the eigenvalue of the eigenvector (0, -1, 0). The corect eigenvalue should be 4.
Adriano,
Thank you for finding this error. The function eVECT will return the correct eigenvalues 12.32456, -.32456, 4 and the associated eigenvectors. I need to figure out why eVECTORS doesn’t return the correct values even though the matrix is symmetric. Also, eVALUES doesn’t return the correct values. I will investigate these errors further to identify the root causes so that I can make the appropriate corrections.
Thanks for your help. I will follow up and get back to you.
Charles
Hello Adriano,
I have had a lot of difficulty trying to correct this problem. With the latest release (issued yesterday, the eVAL and eVALSym functions will calculate the correct eigenvalues and the eVECT function will calculate the correct eigenvectors as well.
I am working on a new approach which I believe will solve this problem and I will likely change eVALUES and eVECTORS to use this approach. I hope to have this available shortly.
Charles
Dr Charles,
En la versión Xreal Static.xlam, cómo se obtiene la gráfica de regresión logística multinomial y la matriz de confusión..???..
Gracias..
Thanks for sending me a spreadsheet with your data and results. The analysis that you did is correct, but since your dependent variable only has two values 0 and 1, you could use binary logistic regression, which provides the type of output that you are looking for.
Charles
Dear Charles,
I am using your RealStatisticsAddIn since a long time. I enjoy it very much!
I recognized, that I have only access to the AddIn, when I switch off and on the AddIn. Otehrwise the Menu is not visible. I have tried to fix that issue by removing the addIn and reinstalling, but ist still occurs in Excel2013. Also the TrustCenter settings and Macro settings does not effect this behaviour. The place, where I located the AddIn is in the document folder, because I am not allowed to safe data on the Program folder. Do you have any clue?
best regards
Matthias
See https://real-statistics.com/appendix/faqs/disappearing-addins-ribbon/
In any case, you should be able to access the add-in menu by pressing Ctrl-m
Charles
Hello Mr. Zaiontz
I am presently working on a project where I have the data set as follows:
1 dependent variable (which is measurable in number in time interval) and
7 independent variable (which are not measurable i.e these variable can be answered in either yes or no)
To prove whether there is any relationship between dependent and independent variables which statistical method should I chose?
Thanks in advance!
You could start with multiple linear regression.
Charles
Good day!
On the Internet I found you thank you just in time
When I follow the instructions for using the “Histogram with Normal Curve Overlay” in the software pack, the histogram comes out with the bins shifted right of the values so the curve is not centered over the histogram. Instead of the bins being: 3-6,6-9, and 9-12, they end up being: 6-9,9-12, and 12-15. Is there a way to correct this error?
Hello ERic,
In Figure 5 of the following webpage, I give an example where the curve is centered over the histogram
https://real-statistics.com/tests-normality-and-symmetry/histogram-with-normal-curve-overlay/
If you send me an Excel file with your data and chart, I will try to figure out what is happening.
Charles
Hello Eric,
Thanks for emailing me your example. I can see that the histogram is indeed shifted. I will look into this further.
Charles
Hello Charles,
thank you so much for helping out through your amazing website.
is it possible to plots a regression confidence interval using nonlinear regression polynomial orders 2nd, 3rd, and 4th?
Hello Jonah,
Thank you for your kind remarks about the website.
If I remember correctly, for polynomial regression you can simply use OLS regression and nonlinear regression doesn’t add any value. Thus, you can calculate confidence intervals as for multiple linear regression: see
https://real-statistics.com/multiple-regression/confidence-and-prediction-intervals/
https://real-statistics.com/regression/confidence-and-prediction-intervals/plots-regression-confidence-prediction-intervals/
Charles
Hi Charles,
I want to use Gumbel distribution to find the minimum wall thickness (not the max) How to do this? I am trying to work on Extreme Value Statistics and I use Gumbel_Inv function to find the maximum corrosion pit depth and it works flawlessly. Now I want to find the minimum wall thickness using the same principle, but it seems that Gumble_Inv always gives values more than the input … for the minimum, the wall thickness should be less as you know … Hope this makes sense!
1. For information about the Gumbel distribution see https://real-statistics.com/other-key-distributions/gumbel-distribution/
2. To find the min of a set S, find the max of S’ = {x: -x in S}. If m is this max then -m is the value that you are looking for.
3. To fit data to a Gumbel distribution see
https://real-statistics.com/distribution-fitting/distribution-fitting-via-maximum-likelihood/fitting-gumbel-parameters-via-mle/
Charles
Hello
I want to compare two analytical methods for measuring the polarization of a juice.
Each time, for the same sample, I measure the polarization given by method 1 and that given by method 2.
I took a hundred measurements.
the average of the deviations of the two methods is 0.02 and the standard deviation is 0.07.
How can I confirm that these methods are equivalent
Thank you
If you really mean “equivalent”, see
Equivalence Testing (TOST)
Charles
Hello,
I am a user of Microsoft Excel 365 on a Windows OS System.
I tried to install the XRealStats file as well as the RealStats Add-in for Microsoft Excel multiple times.
After adding the XRealStats add-in into Excel, I was unable to see the add-in in the data ribbon. I was also unable to access it using Ctrl+M shortcut either.
Dipam,
What do you see when you enter the formula =VER() in any cell?
Charles
Hello Charles
Hope you are doing well. I am working on analysis of future rainfall trend analysis on grided data of 206 points. Could you please suggest me any method or test for this.
Thanks
Please describe in more detail the type of analysis you need to perform. More specifically, what sort of hypothesis testing or forecasting do you require?
Charles
I want to do a trend analysis test on downscaled monthly mean grided rainfall data of 90 years (2010-2099). The grid size is 25km x 25 km with 206 points of data. I just want to check the trend is positive, negative or no trend.
year latitude longitude jan feb ……
2010 23.875 67.625 1.90 2.01 …..
2010 23.875 67.875 1.76 1.8 ……
2010 23.875 68.125 1.59 1.71 …..
2010 24.125 67.375 2.30 2.40 …..
…. … … … …
Data is like this, for each year there is 206 points of lat and long.
See my previous response.
Charles
Perhaps the Mann-Kendall test is what you are looking for. See
Mann-Kendall Test
Charles
Thanks
Thank you so much for this great resource. I am using your Excel tool pac to calculate Chi sq and Fisher test on 2×2 tables. I was wondering if there is an easy function to get an OR and 95% CI. Thank you!
Hello Sarah,
Use Real Statistics’ ODDS_RATIO array function. See
https://real-statistics.com/chi-square-and-f-distributions/effect-size-chi-square/
Charles
Hello Charles
Hope you are doing well. I just used the excel package of Mann Kendall and Sen Slope test from here but the result of z value is greater than 10 and p-vales are very small, then i went to do it manually getting help from your examples but vales remains the same. What is the solution of this problem?
Hello Suhail,
Why is this a problem? Perhaps these are the correct values.
Charles
Hello Charles,
Thank you for including item response models in your Add-in package. I have three questions relating to IRT calibrations.
The first question is about estimation method on polytomous data. I found that the estimation method for binary data is almost identical to the method of Winsteps, but is quite different for polytomous data. When Winsteps did something similar as the binary estimation (i.e., applied the iterations of UNCON estimation on top of the PROX estimates), I am curious to know why you didn’t choose to apply iterations as well? Is there some advantages of PROX method or disadvantages of the UNCON iterations with polytomous data that you are considering of?
The second question is about the output for polytomous data. The Add-in package doesn’t give expected scores, nor fit statistics (e.g., infit, outfit, or residual) for polytomous data, is it possible to obtain those as well?
My last question is about removing extreme values. I tried to use the Add-in menu to remove zeros and perfect scores before calibration, but Excel either reports errors. The “RASCH_INIT” function doesn’t work for me either. Do you have solutions for this?
Thank you!
Carol
Carol,
Originally I was initially only going to add the UNCON approach, but I then decided to include the simple version of the PROX approach. This was based on Wright & Masters’ Rating Scale Analysis: Rasch Measurements. I would be happy to provide a better solution for the polytomous case if I had a good reference that would explain the procedure required (including JMP). Can you point me in the right direction for this?
Regarding the reports, if you can send me some examples of the reports that you desire I will look into adding these.
Regarding your last point, can you send me an Excel file with a case where the add-in package didn’t remove extreme scores properly?
Charles
Hi Charles,
Thank you very much for your reply. I have sent you an email including links to Winsteps’ Polytomous models, some example outputs, and an Excel file about removing extreme scores. Please let me know if you need more information.
Thank you for your help!
Carol
Carol,
Thank you very much for sending me this information. I plan to look at it later this week and then I will reply to you.
Charles
How does one reference this site when submitting to a scholarly journal
Hello Lance
See Citation
Charles
Anders Kallner
May 13, 2020, 5:47 PM (13 days ago)
to czaiontz
Dear Charles, your page is a fabulous resource for us amateurs and I have learned a lot over the years
.
Now, I have a problem using the Clopper-Pearson exact method to calculate the confidence interval of the sensitivity and specificity from a four-fold, contingency, table. I want to include the CI in an interactive spreadsheet for calculating the sensitivity and specificity.
I calculate the sensitivity from the distribution of results in the diseased group. There were 40 negatives and 76 positives, thus the sensitivity 0.655. (76/(76+40)
I figured the CI(95) would be
LL Beta.inv(0.025,76,40) and the UL Beta.inv(0.975,76,40) giving the CI(95) 0.567 to 0.739.
Analogously the CI(95) of the specificity was calculated. from the true negative and fall positive results in the diseased group.
There is a Clopper-Pearson calculator on the internet which gives slightly different results and my question is if I have chosen an erroneous approach with he EXCEL binary commands:
https://www.danielsoper.com/statcalc/calculator.aspx?id=85
Hello Anders,
Thank you for your comment. I have just added support for Clopper-Pearson to the Real Statistics software. This is included in the PropCI function that is part of the Release 7.1 that was issued a few minutes ago.
Charles
I hope you are doing well and staying healthy in these days of coronavirus!
Thank you. I hope that you are doing well also.
Charles
Hi Charles,
Your website and tools have helped me greatly. I want to try to write a process document to help explain to my team what I have done. I need to write the algebraic formula for forecasting y(i+1). However, the equation is confusing me. I have a constant, phi 1 & 2, Theta 1,2 & 3 and 26 Phi values. Would you write the equation for me, please?
Best,
Anshu
Hello Anshu,
The equation for such a model is described at
https://real-statistics.com/time-series-analysis/arma-processes/arma-processes-basic-concepts/
There isn’t a simple equation to find these coefficients.
Charles
Hi Charles,
I successfully installed the real stat package into Excel 365 in Mac, I get a run time error saying “Method ‘Value’ of object ‘Range’ failed” every time I try to run the logistic/probit regression. When I close the error message, I get only half of the regression output.
How can I rectify the error?
Hello Kay,
If you email me an Excel file with your data and output, I will try to figure out what is going wrong.
Charles
Hello! Absolutely wonderful website ! I was working through the AUROC sections and found them very useful. Regarding the section on calculation of Confidence Intervals for the AUROC, can you please mention the reference for the formula used?
Thank you very much for your support and kind words.
The reference that you asked for is
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Area_Under_an_ROC_Curve.pdf
Charles
I used SPSS Ver 24 to run logistic regression on my data. I no longer have the license for SPSS and as a result I came across your Real Statistics for Excel. I have downloaded your package for Windows Excel 365 and reran some of the logistic regression models and comparing results to the SPSS results and found them to be widely different on all parameters tested. Is this something you have heard before?
Hello Simon,
No, I have heard the opposite.
If you email me an Excel file with your data and the results you got previously from SPSS, I will try to figure out why the results are so different.
Charles
Charles,
I need to conduct an interrupted time series analysis. You have quite an extensive array of options for time series analysis. Does any of them apply to an interrupted time series?
Hello Brian,
As you have observed, the Real Statistics software provides a number of time series analysis capabilities and I plan to add additional ones shortly. Currently, there are no tools that explicitly address interrupted time series.
Charles
Hello, Thank you for this valuable resource. The function QProb is not working properly in the context of a Tukey’s post hoc. Qcrit delivers the correct value with parameters from my dataset. Qprob however shows 1 for mean differences that are larger than the calculated q=Qcrit*sqrt(MSw/dfw). The website calls the first argument in the function q while ctl+shif+A in the worksheet calls the first argument x. If I increase the value of the first argument (greater than q) the result eventually becomes significant. I am using Excel 2010 on windows 10. It would seem that to estimate the probability, the function would need the k,dfw,alpha, and the difference to estimate. Best Regards
Hello Jim,
I just checked and it seems like the QPROB function is working properly. I just responded to your email with an explanation.
Note that, in general, you should use the QDIST function instead of the QPROB function since it gives more accurate results.
Charles
Dear Charles,
First of thank you for this very informative website. I stumbled upon your page when looking for more information on Truncated Normal Distribution.
For my bachelor thesis I have to analyze wind data.
I have those values for example:
– 2,3 average
– 1,2 minimum
– 3,7 maximum
– 1,0 Standard deviation
For performance calculation, I need the percentage from each value (in 0.1 steps) between 1,2 and 3,7.
I tried TNORM_DIST(x, μ, σ, cum, a, b) with e.g. TNORM_DIST(1,3, 2,3, 1,0, 1,2, 3,7).
But I don´t get the right values and it doesn´t sum up to 100%.
Thanks in advance and sry for my bad english.
greetings
Thorben
Hello Thorben,
When you write TNORM_DIST(1,3, 2,3, 1,0, 1,2, 3,7) do you mean TNORM_DIST(1,3; 2,3; 1,0; FALSE; 1,2; 3,7) or TNORM_DIST(1,3; 2,3; 1,0; TRUE; 1,2; 3,7)? Actually if you are summing for the values 1,2; 1,3; 1,4; …; 3,6; 3,7; then neither of these will sum to 100% nor should you expect them to (since this is not a discrete distribution).
Charles
Hello Charles,
thank you for your quick response.
I meant TNORM_DIST(1,3; 2,3; 1,0; FALSE; 1,2; 3,7).
I thought the probabilities of all x values would add up to 100 percent, if I put all the x values between the minimum and maximum limits, behind each other, in this formula.
My problem is that I need the probability for each value between 1,2 and 3,7. And I have those values to work with:
– 1,2 minimum
– 3,7 maximum
– 2,3 average
– 1,0 Standard deviation
So I think that 2,3 should be the highest probability and 3,7 the lowest probability. If I add up all the probabilities of the individual values, 100 percent should be reached.
Because in the end I want to calculate the performance of a wind turbine using the individual proportion of each wind speed value.
I dont really know how to solve this problem yet.
I am sorry for disturb your time but wonder if you could help me out?
Thanks in advance.
greetings
Thorben
Thorben,
There is an infinite number of values between 1.2 and 3.7.
How to proceed depends on what you want to accomplish.
One approach to just using the values 1.2, 1.3, 1.4, …, 3.7 might be to use a grid approach. See
Grid for Bayesian Statistics
Charles
Dear Charles,
your website is the most amazing thing I’ve ever seen! Thank you so much for helping us out.
After reading lots of this, I think I have the best strategy to analyse my data but I would like to confirm, because there are a few things I would like to test.
This is my first experiment, and I work with the animal model. So, I have this procedure I want to do in a group of animals, in which I have a control group (negative for a molecular marker) and a experimental group (positive for that molecular marker). For better understanding, I am trying to develop a scoring system to evaluate their phenotype (i.e., “looks”) with 6 parameters (measurements and visible features), in which I give 1 point if they are good and 0 if not. I am still trying to figure it out because at this point, it is overstimating small mistakes and understimating multiple defects. Nonetheless, They can only be evaluated with score = 0, score = 1, …, score = 6.
I also want to state that I have a really small number of subjects for each condition (as I tested the procedure with different ages and in some I had a different protocol), with n=5-7 for each control and each experimental groups, having 8 data sets this way. Sometimes the number of controls is higher than the number of subjects in experimental group.
1) I want to prove that my control group and experimental group come from different samples BEFORE the procedure, meaning this that the molecular maker has importance in their score. Note that my control group is not perfect, but “it looks like” in a better shape (more 5 and 6 scores) than my experimental group (mostly 1-4 scores). Since I have interval data (correct me if I am wrong), I think the best way to do this is to perform the Chi-square Test of Independence, right?
2) I want to prove that my procedure improves their score regardless their molecular expression. To do this, shall I use Wilcoxon Signed Rank Test for Paired samples with before/after procedure data?
3) I want to know if after the procedure, if my experimental group is as good as my control group. Should I use Wilcoxon Signed Rank Test for independent samples or Mann-Whitney test for independent samples? Should I run the Chi-square Test of Independence again?
4) I want to understand the impact of the procedure depending on their age. Should I just compare their results with the tests mencioned above or should I mix all the control groups and all the experimental groups that can fit the same protocol and perform a specific statistical test?
Sorry for the long explanation! I hope I was clear enough 🙂
Best regards,
Ana
Hello Ana,
Thank you for your kind words about the Real Statistics website.
1) usually you take your sample and divide the elements into the treatment and control groups randomly before you start the treatment. You appear to have done something different. Please explain more clearly. Also, you said that you have 8 data sets. How were the elements in these assigned?
2) Wilcoxon Signed Rank Test is commonly used for this purpose. When the differences between the groups is normally distributed, you are slightly better off using the paired t test.
3) The Wilcoxon Signed Rank Test and Mann-Whitney Test are equivalent. You can use either one and you will get the same result. Whether to use the chi-square test depends on what kind of data you have.
4) It depends on what hypothesis you want to test. Also, see my response to question #1.
Charles
Dear Charles,
thank you so much for such a quick answear!
1)Yes, it was different. Most works divide animals/people in same conditions in two groups to test the effect of a somewhat procedure. My work is different, as I know the animals are not in the same conditions in the very first beginning, when it comes to that molecular marker. That being said, their looks are not the same either. It is almost easy to guess with our eyes which are and which aren’t positive for the marker. But I need some numbers/statistical tests that can prove so. And that was one of the reasons that lead me to create a score to characterize them into classes.
This is, by itself, something new to the community, and for that reason, even if they don’t get better with the procedure (which I believe they do), this needs to be described.
As for the data sets, thing is, due to some construction work at the lab and then COVID-19 epidemy, my experimental conditions are not the very same. This is a strong limitation that I have, I recognise. Some of the animals were submitted to the procedure 20 days after they were born, others 24 and others at 30. Some of them were submitted to the procedure twice. And some of them were submitted to a less effective procedure. The “path” chosen for each one was determined randomly (or at least I hope so, because this differences were not really planned).
2) Thank you so much! In fact, since this is a score 0/1/2/3/4/5/6 with no numbers in between, so I can assume right away that is is not a normal data, right? That’s why I thought about Wilcoxon Signed Rank Test.
3) Got it!
4) I would say H0: the procedure is effective despite the age that was done. But I do believe the same is not true for the ones I say that were submitted to a less effective procedure, that’s why I thought about dividing them to groups as near as possible from the same conditions.
Thank you so much!!! You are very kind, I am truly grateful.
Yours sincerely,
Ana
Dear Dr. Charles,
Sorry to disturb your time, I am writing to you to ask for some advice. As now I am reading the article: Factors influencing consumer behaviour towards store brands: evidence from the French market.
In the paper, that mentioned, the total respondent is 266 person. and I have counted the total items using for the survey are 33 items ( in which 9 items were removed ). The authors want to proceed with the EFA and CFA ( as some items were not tested in the French context). After they removed some incomplete questionnaire, there including:
– 140 questionnaires were selected for EFA
-and 266 questionnaires were selected for CFA
I keep reading, but could not understand how which, 266 respondents, and 33 items then only 140 questionnaires for the EFA? Do you think I miss anything?
I am sorry for disturb your time but wonder if you could consult me?
Hello Todd,
Without seeing the analysis, I am not able to say why they removed some of the questionnaires and some of the items.
Charles
Dear Dr Charles,
I will email you with the paper. Thank you so much for your reply and support 🙂
Todd
Hi Charles,
First thank you so much for all of your responses. I stumbled upon your page when looking for more information on Binomial Distribution. I am trying to figure out how to calculate the distribution based on different probability of outcome.
For example, there are 5 events with probability of success of 5%, 50%, 55%, 75%, and 92%. What is probability that 1,2,3,4,5 are successful?
Am I on the right track with binomial distribution or is there a better/easier way to calculate this in excel? Thanks alot.
Gary,
If the events are independent then you can use the binomial distribution. E.g. if you want the probability of 2 successes among 5 independent events where the probability of success for each event is 5%, then you can get this result in Excel by using the formula =BINOM.DIST(2, 5,.05,FALSE).
Charles
Right – but what if the probability of each independent event is the following: 5%, 10%, 40%, 60% and 80% and I’m trying to see the probability of 3 successful events from those 5 independent event success rates?
Thank you for the response.
Gary,
Just change the .05 in the formula to .10, .40, .60 and .80.
Charles
Sorry Charles, and thank you for the responses. I guess the way I’m wording the question doesn’t make sense.
You play 3 games. You have a 10%, 40% and 60% chance of winning each game. What are the odds that you win 2 of those 3 games? Is that still the binomial formula?
Thank you,
Are you saying that the probability of winning game 1 is 10%, of winning game 2 is 40% and game 3 60%? In this case, the probability of winning exactly two of these games is (.1)(.4)(1-.6) + (.1)(1-.4)(.6) + (1-.1)(.4)(.6). This is not the binomial formula.
Charles
Thanks so much, that makes so much more sense to me. One last question – how does the equation change if you wanted the probability of winning 1 game from the below example:
Are you saying that the probability of winning game 1 is 10%, of winning game 2 is 40% and game 3 60%? In this case, the probability of winning exactly two of these games is (.1)(.4)(1-.6) + (.1)(1-.4)(.6) + (1-.1)(.4)(.6). This is not the binomial formula.
Hi Gary,
What do you think the answer is following the pattern from my response?
Charles
Is it 1 – (formula you provided for exactly 2) – (.1x.4x.6)?
I guess I’m confusing myself because I’m thinking of the question as, what is the probability of winning more than 1 game and the formula is what is the probability of winning exactly 1. MOre than 1 would be probability of 2 exactly + probability of exactly 3, right?
Dear Charles
I am a urologist and I want to compare two parameters of emptying of the renal pelvis with the function of that kindey
The function is in %
The two parameters are T 1/2 value (calculated in minutes) and the NORA value (calculuated as a ratio of the amount of tracer at 60minutes divided by the amount of tracer at 2 min)
Kindly let me know which staistical test should I use
What hypothesis do you want to test?
Charles
Great stuff. Your site has been most helpful with respect to the non-central t distribution.
I am pleased that you have gotten value from the website.
Charles
Hi Charles,
I must compliment you an a stunning achievement, You provide truly understandable explanations and extremely useful tools for Excel. But most impressive is that you coverage is both comprehensive and up to date. You are the only site explaining and providing a detailed numerical example of the very new (2009-2010) test statistic developed by Bian et al. Yes it is mentioned Wikipedia as reference 10 here (https://en.wikipedia.org/wiki/Sign_test) but you are the only site to have a numerical example.
Inference suggests that Bian’s statistic is alive and well and being used to decide if a “Playing Condition Correction” is required on any particular day in the new World Handicap System.
Thank you for the great work!
However, in studying your example I noticed a typo here: https://www.real-statistics.com/non-parametric-tests/trinomial-test/ …
The Excel formula agrees with Bian’s paper but formula for p(z) shown as a graphic, has (n + z)! in the denominator. I believe this should term should be (k + z)! as illustrated in your very useful excel formula.
Hello Samuel,
Thank you very much for your kind remarks and thank you especially for catching this error. I have now corrected it on the website. I appreciate your help in improving the accuracy and useability of the website.
Charles
Dear Charles,
Many thanks for your helpful website.
I have a question, what the meaning of ” countif(range, criteria)=0″ is?
Sincerely yours,
Idham
Hi Idham,
See https://real-statistics.com/excel-capabilities/built-in-excel-functions/
especially Figure 8.
Charles