Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here
Hello Charles
I am running a series of regression analyses using first, second and third order polynomials. The Excel LINEST function works OK for normal X and Y data sets, but when I attempt to used scaled X data, the function does not give correct results.
Can this add in be used to run regression analyses with scaled and/or weighted data sets??
Hello Frank,
Yes. See Weighted Linear Regression
Charles
I’m currently using the RealStatistics pack in Office 2019 to successfully execute Tukey HSD-Kramer testing on 2 sets of the same survey given to students at the beginning and end of the year. This year, twice as many students took the survey at the end compared to the beginning so differences in the means of the unequal sample sizes seems to warrant a Tukey HSD-Kramer transform. However, for some of the tests, especially when a large difference from the first and second surveys is found, the p value in the Q Test block is negative (e.g., -7.43849E-14). What does this mean?
Also, is there somewhere that we can look up the various abbreviations used by Excel such as ss, f crit, mean crit, etc? I need to be able to explain these things to my labmates and I am not sure exactly what they mean.
The RealStatistics package is a lifesaver! I have to use a license server for SPSS and sometimes I can’t get time to do analyses so Excel as a quick and accurate check is an ideal solution for seeing if data warrants further investigation or not. Thank you!
Hello,
1. It probably means that p-value = 0. The p-value should never be negative, although a value such as -7.43849E-14 is so close to zero that this may only be a roundoff issue. In any case, I would check to make sure there isn’t some sort of real error.
2. I don’t know of a place where you can find abbreviations used by Excel. The Real Statistics website does explain these abbreviations, but this information is defused throughout the website and is not in one place.
3. Thank you very much for your kind remarks about Real Statistics.
Charles
Thank you SO much. The step-by-step explanation of the two sample KS test just saved me from a mental breakdown. Can’t thank you enough! <3 an over-tired PhD student
Glad I could help.
Charles
Hi, yesterday i publish a comment but didn´t appeared. I have a Mac and Office 365, i download the program for this version of Excel and for Mac. I followed the instructions as the website says. And when i press Tools, and then complements appears the click box with the option Xrealstats-Mac, but in the Complements tab, nothing appears. Just grey and at top left the spaces of 2 tabs, but empty. Please help to make it work. Its really urgent.
Thanks.
Hello,
I assume that by “Complements” you mean “Add-ins” (in English). In the Mac version of the software, you just need to press the key sequence Ctrl-m.
Charles
Doesnt work. Can i send you the screen shot of how it looks, please? I think it will be more clear that way.
Thanks.
Yes, please send me the screenshot.
Hi Mr Charles Zaiontz,
I’m reading the topic “KDE Example
Example 1: Create a Kernel Density Estimation (KDE) chart”
In part:
“E.g. f(-6) = 0.000839 (cell G3) is calculated by the formula =SUM(H3:M3)”
I think they must be:
E.g. f(-6) = 0.000839 (cell G3) is calculated by the formula =SUM(H3:M3)/(n.h).
And item: “To create the KDE chart we now highlight the range G2:H53”
They must be: “F2:G53”
Thanks for your topic.
Hello Hue Nguyen,
Yes, you are correct. I have now made the necessary corrections on the webpage.
Thank you very much for catching these errors and improving the quality of the information on the website.
Charles
Hello Charles,
I have not known how to choose bankwidth “h” and the number of points “nn” fit-for-purpose. They depend on the size of data, don’t they? If have, may you show formula to calculating them?
Thank you very much.
Hue
Hello Charles,
I read the formula to calculating bandwidth “h”. Thank you.
May I have a question to ask you?
Do you remove the extreme outliers of data when prepare a kernel density plot?
Thank you very much
Hue
Hello Hue Nguyen,
I don’t believe that I have removed outliers, but if “extreme outliers” refers to outliers that distort the picture (instead of values that may be reasonably found in a large sample), then it may be prudent to remove them.
Charles
Hello Hue Nguyen,
There are various opinions about what is the optimum formula for h. See the following webpage for details.
https://real-statistics.com/distribution-fitting/kernel-density-estimation/
Charles
Hi, Dear Charles:
Appreciated at your assistance.
Hello Sir,
I am unable to use SARIMA even after downloading the real statistics pack. The TS tab has no option called Seasonal ARIMA or SARIMA. Please help
The SARIMA data analysis tool is available in all of the latest versions of the Real Statistics Resource Pack. The latest version is 6.8. If you are using Excel 2007 for Windows or Excel 2011 for the Mac, then you won’t find this data analysis tool since support for these versions of Excel was frozen prior to the inclusion of SARIMA.
Charles
Thanks for your quick reply.
I am using MSO Excel 2016. Is there no other way for me to use SARIMA in excel? please suggest
The SARIMA data analysis tool is available in the latest versions of Real Statistics for MSO Excel 2016. If you have release 6.2 or later, then SARIMA support is included. I suggest that you insert the formula =VER() into any cell and check to make sure that you have one of these releases. If not, you will need to download the latest release from the website.
Charles
Yes. Now it is working. I downloaded the latest pack. Thanks alottttt….!!!
Great, Glad that it is working.
Charles
Hi Charles,
Thanks again for an excellent suite of routines. I have been using your program to produce confidence ellipses for bivariate normative data. I now wish to consider where data for new cases lie in relation to the ellipse norm. However, from my reading when considering new cases one should use a tolerance ellipse not confidence ellipse (e.g. https://blog.minitab.com/blog/understanding-statistics/whats-the-difference-between-confidence-prediction-and-tolerance-intervals), although there appears to be very little difference (see https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/bvntolpl.htm).
Is it possible to transform your confidence ellipses to tolerance ellipses?
Regards,
Leigh
Hi Leigh,
For information about creating a tolerance interval, please look at
Tolerance Interval
Tolerance Interval Example
The website doesn’t yet describe how to build the multivariate version, namely a tolerance ellipse.
Charles
Many thanks for the quick reply and the reference to tolerance interval information.
Is this something that you are likely to do in a near future version or should I look elsewhere to derive tolerance ellipses?
Regards,
Leigh
I have no short term plans to do this.
Charles
Just wrote about conflict with excel. Solved my problem by turning off excel tool pack.
Perhaps you said that yet I misunderstood.
Thanks.
Just got my lovely graph.
Attempting to use your stats for excel. Every time I attempt to use them I get excel conflict where “Random number generation-the type of distribution is required. Entry must be an integer between 1 and 7.” I have not opened the Excel tool so there is some conflict. Suggestions?
Hi Patty,
I am not familiar with this error. When did it appear? This looks like an Excel data analysis tool message, but I am not sure. The Excel tool
remains active whenever Excel is opened. To close it, you need to press Alt-TI and uncheck the tool.
Charles
Dear Dr. Charles:
Sorry I have another problem. I conducted a two-way manova using your excel vba macro. I don’t know why the manova outputs regarding the multivariate tests are not consistent with the SPSS results.
With my best regards and thanks!!
Fred
Hello Fred.
If you email me an Excel spreadsheet with your results from Real Statistics and also send me the results you got from SPSS I will try to figure out why there is a difference.
Charles
Hi, Dear Charles:
Great. The multivariate outputs from your newly-revised the Real Statistics results now has been consistent with the counterparts of SPSS outputs ?
Hi Fred,
Good to hear and thanks again for identifying this error.
Charles
Dear Charles:
Your stat excel vba is great and easy to use. Yet, there is something wrong with the interaction contrast cells output for the residual part and others when I conduct a Two-way MANOVA Follow-up Test . It seems that the cov2pooled function should be used instead of the covpooled function, I suspect.
Looking forward to hearing your insight!
Many Thanks.
Hi Fred,
Is this the same issue that you believe this causing a difference between the Real Statistics and SPSS results?
I have not yet had the time to look into this, but I will and then I will respond.
Charles
Thanks。
I am out of town for a week. I will do as you suggest later.
Hi,
Happy New Year!
Hi Fred,
You are probably correct, but can you point me in the correct direction. Where did you see the covpooled function? Can you give me a specific webpage or spreadsheet in the examples workbooks? Thanks for your help.
Charles
Hi Dr. (or anyone well versed in excel)
Im a mental health professional and data analyst at a large rural mental health clinic in south central washington state.
I am trying to do a pivot table that looks at dates of service and time of service for the last two years.
There is a count of 214 when I look at the raw data sheet but when I create a pivot table the sum is 117 or something like that. I’ve played around with the formatting under the Home Ribbon as well as within the Pivot but its not capturing the total services from the raw data.
If someone could please help me with this it would be greatly appreciated.
Hi Julian,
Without seeing the pivot table, I don’t know why you would see such a discrepancy.
Charles
Dear Dr Charles Zaiontz,
Thank you for creating this recourse pack! It was truly a life-saver for me in my current MSc (Research) dissertation where I had little experience using other programmes such as R, as I had been taught in my undergrad to use primarily Minitab. Unfortunately, I was having issues in Minitab running a Dunn’s test following my Kruskal-Wallis test as it required me to download a Macro of which was not suitable for the version I own. As such, I found your resource pack for Excel which allowed me to perform the tests I required with comprehensive guides on how to do so.
I just wanted to say I am very grateful to you for allowing access to this pack for free, as well as providing detailed insight advice throughout your website. I will be referring to your website and pack within my dissertation, so that others may find more about your work and not ignore Excel as a potential option for Statistical analysis.
Kindest Regards,
Naomi
Thank you very much, Naomi. I am pleased that you are using Real Statistics and are benefitting from the website and software. I started this project for people like you and I am always gratified to learn that the intended results look like they are being achieved.
Charles
Hello Dr Zaiontz
I am completing Rasch analysis to create a Wright map. I have figured out everything except how to do the iterations (How to do Figure 5 of your example). I understand I need Figure 5 to complete the Wright map. How do I do the iterations to get the residuals to zero? I am sorry for bothering you. But I really appreciate these instructions for Rasch.
Hello Dr Jan Nick,
You can look at the spreadsheet shown in Figure by downloading the Correlation-Reliability examples workbook from
https://real-statistics.com/free-download/real-statistics-examples-workbook/
Charles
Dear Prof Charles Zaiontz,
Thank you very much for sharing your knowledge about building Rash IRT 1PL.
It is really great. I was able to follow your steps and convert them to VBA Excel macros and produce the same results. You helped me understood the process and algorithm.
As I am planning to take the next step to further enhance my knowledge and move to 2PL and 3PL modeling if possible, can you please give me some advises and/or references to some similar guides and cookbooks for 2PL and 3PL.
Thanks again, your Rash guide is the best hands on Rash model building guide.
Best Retards,
Mohammed A. Tayyib
Thank you very much for your kind remarks. I am pleased that I was able to help you. I will eventually add 2PL and 3PL, but I have been focussing on other statistical capabilities lately. Some of the references in the bibliography may be helpful. See
Boone, W. J. (2016) Rasch Analysis for Instrument Development: Why, When, and How?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5132390/pdf/rm4.pdf
Boone, W. J. and Noltemeyer, A. (2017) Rasch Analysis: A primer for school psychology researchers and practitioners. Cogent Education
https://www.tandfonline.com/doi/full/10.1080/2331186X.2017.1416898
Moultan, M. H. (2003) Rasch Estimation Demonstration Spreadsheet
https://www.rasch.org/moulton.htm
Wright, B. D. and Stone, M. H. (1979) Best Test Design. MESA Press: Chicago, IL
https://research.acer.edu.au/measurement/1/
Wright, B. D. and Masters, J. N. (1982) Rating Scale Analysis.. MESA Press: Chicago, IL
https://research.acer.edu.au/measurement/2/
Furr, M. and Bacharach, V. R. (2007) Psychometrics: An Introduction; Chapter 13: Item Response Theory and Rasch Models. Sage Publishing
https://in.sagepub.com/sites/default/files/upm-binaries/18480_Chapter_13.pdf
Wright, B. and Stone, M. (1999) Measurement Essentials, 2nd Ed.
https://www.rasch.org/measess/
Why do I have to remove the add-in, save my workbook, and then re-add the add-in every time I start up Excel. The add-in never seems to “take” —- it always vanishes from the menu….. I start up Excel and it’s never there.
Thanks.
Jeff,
Three simpler approaches are described at https://real-statistics.com/appendix/faqs/disappearing-addins-ribbon/
Charles
Dear Charles Zaiontz,
Thank you very much for sharing your knowledge about statistical analysis!
I am trying to perform the Yuen-Welch’s Test, following the example of your page (https://real-statistics.com/students-t-distribution/problems-data-t-tests/yuen-welchs-test/ ). The problem is that the output only gives me the value corresponding to the t-stat (only the number), but do not show me the rest of output information. I need to configure something in Excel (2013) ?, possible solution?
Again, thank you very much, greetings
Javier Baier
This function is an array function and so you need to use it in a different manner to get all the output. See the following webpage for more information about how to use an array function:
Array formulas and functions
Charles
Your add in is excellent and very useful. I am interested in your LASSO procedure. I can see how it is useful for choosing predictor variables but it is not clear to me how you can back-calculate the unstandardized regression coefficients. This would appear sensible to obtain the LASSO coefficients for a prediction equation and appears to be possible (e.g. https://stats.stackexchange.com/questions/74622/converting-standardized-betas-back-to-original-variables). Simply using LASSO to determine your predictor variables and then simply using these chosen variables in OLS would seem to defeat the advantages/value of LASSO.
Thanks
Hello Leigh.
Regarding standardized regression coefficients, see
Standardized Regression Coefficients
Re LASSO regression, see
LASSO Regression
Charles
Hi,
Many thanks for this. I had read the LASSO page but not the one on standardised coefficients – I should have realised that you would have had this covered!
On a different but related note, I often use Passing-Bablok regression for method comparison. I searched your site but at present it seems not to be included although the related Deming regression is.
Regards,
Leigh
Hello Leigh,
Real Statistics currently doesn’t support Passing-Bablok regression, but I am considering adding support for this capability.
Charles
thank you alot i’m understand your suceested
do you have a way to perform ‘reduced major axis’ in excel?
Hello Luke,
Sorry, but I am not familiar with the term “reduced major axis”. What is it?
Charles
also known as ranged principal axis or standard major axis.
a different version of Ordinary least squares regression.
http://strata.uga.edu/8370/lecturenotes/regression.html
Luke,
This may be the same as total least squares regression. See
https://real-statistics.com/regression/total-least-squares/
Charles
Thank you Charles for you website and making statistics a little bit more understandable for those of us who did not pay attention at University.
In understand that the Anderson Darling test can be used to compare two distributions two see if they are the same, yet all the examples I can find on the internet of Excel templates have a set of data and then a normal distribution based on the data’s mean and variance to compare it against.
I just do not understand how to adapt the templates for my purposes.
I need to understand the formula’s so that I can adapt them into my Excel Model.
Computing the values separately just will not suffice.
Many thanks once again.
Hello Paul,
The Real Statistics website uses Kolmogorov-Smirnov test to compare two distributions (actually the samples of two distributions). Anderson Darling is used on the Real Statistics website to compare one sample with a given distribution. See the following webpage for an example of comparison with a gamma distribution (Example 2).
https://real-statistics.com/non-parametric-tests/goodness-of-fit-tests/anderson-darling-test/
Anderson-Darling can be used to compare two distributions. See the following article:
http://www.jaqm.ro/issues/volume-6,issue-3/pdfs/1_engmann_cousineau.pdf
Charles
Hi Paul,
I have decided to add the two-sample Anderson-Darling test to Real Statistics in the next release due out this month.
Charles
Thank you for the website! It has been really helpful, especially for an entrant in the world of statistics like me.
One question regarding the homogeneity of variances in (non)-linear regression:
I have my vector of experimental values (y_exp), the vector of calculated values (y_calc) and the vector of residuals (residual) which was calculated from the other two. I want to apply tests to verify if the variance is homogeneous so as to check the underlying regression assumptions. Question is:
Should I apply tests like Levene, Welch, and similar to the group of values y_exp against the group of values from y_calc?
Should I apply the same tests to the residual instead? If so, what do I compare it to, as the aforementioned tests compare the variance between two groups?
Best,
Gustavo
Hello Gustavo,
For linear regression you apply the homogeneity of variance test (e.g. Levene’s test) on the observed data (i.e. experimental values). See
Homogeneity of Variances
Charles
Dear Prof. Zaiontz,
Thank you for your reply. Then, am I right to assume that these tests depend on the linearity of the regression, i.e. they are not applicable for non-linear regression?
Best regards,
Gustavo
Yes, these tests apply specifically to linear regression models. They can be used in some non-linear regression models when they are transformed into a linear model (e.g. exponential regression).
Charles
Dear Dr. Zaiontz,
I found a new test called “trinomial test” which is improved version of the paired sample sign test. This test includes the information of zero differences or tied observations, while sign test drops it. I would like to learn this test and I tried to perform test on certain example. However, not everything is clear for me. Unfortunately authors of this test don’t provide any numerical example. Therefore, I have a big favor to ask you, could you read the article about this test, and describe it on this website as well as implement it in Real Statistics software. It seems like trinomial test is not well-known and describing it you can popularize it. This test may be helpful for many people. Article is available here: https://www.researchgate.net/publication/29781443_A_Trinomial_Test_for_Paired_Data_When_There_are_Many_Ties
Thank you in advance,
Wojciech
Hello Wojciech,
Thanks for bringing this test to my attention. I will look at the article and consider adding it in the future.
Charles
Hello Wojciech,
I have now investigated this test and in fact have now implemented in Real Statistics and have some numerical examples. I am curious as to why you want to use the test (e.g. instead of the Signed Ranks test)?
Charles
Hello Charles,
If you have not many zero differences, there is no significant difference between sign test and trinomial test and you should obtain similar results using both tests. However, suppose the source data are in the format of rating scale, let’s say, k-point scale (e.g. Likert scale). The less values are on the scale, the larger number of zero differences is probable. If most of the non-zero differences would be in one direction, decision is to reject the null hypothesis in favor of the alternative hypothesis (because sign test ignores zero differences). Meanwhile the excessive amount of the zero differences suggests that two paired samples are not significantly different. In my opinion, the information about zero differences is equally important as non-zero differences. By the way, it is explained in mentioned article. We can read as follows: “the trinomial test increase the use of sample information”. I hope I convinced you.
Wojciech
Thank you. Where did you hear about the existence of the trinomial test?
Charles
I was looking for alternative test to sign test via Google. To be honest, I found that test somewhat by accident.
Thanks for your update. I will include this test in the release that I plan to issue on Monday if not before.
Charles
HI Charles,
I am a student in psychology and have been using your tools for just 1 day now, and already found it really useful. Thank you for sharing your knowledge and work.
Kind Regards
Beyers
Hi Beyers,
Thank you very much for your kind remarks. Having worked with my wife, who is a clinical psychologist, on various research projects, I have tried to make these tools especially useful for psychological research.
Charles
Is there a way to back out the exact p values for a dunnetts test following one-way anova? The function in the package rounds things to zero for p.1, but I want the exact p values for all of my conditions. Thanks.
Hi Erik,
Real Statistics is using the table of critical values. This table only shows values between .01 and .10. Any p-value lower is reported as 0 and any value higher is reported as 1. These really mean < .01 and > .10 respectively. I don’t have more exact values.
Charles
Hi Charles,
Great Add in.
Is there anything in the add in for panel regression?
Hi Niten,
Thank you for your kind remarks.
Nothing on panel regression yet, but I plan to add something soon.
Charles
Hi Charles,
Great add-in by the way! Real helpful!
I’m doing stepwise regression and on my second data set, I’m getting a message that says: A run time error has occurred. The analysis tool will be aborted. Overflow.
What does this mean and what could have caused this error? Is there a way to fix this?
John
Hello John,
It you email me an Excel file with your data and results, I will try to figure out how to address this.
Charles
Hi Charles, I am doing a SARMA(2,2) seasonal 26 period analysis on weekly returns of various asset classes.
I have written a macro to automate the running of the solver and to populate the results into a dynamic chart, to avoid me having to click into the add-in and to copy and paste the data into a chart.
I am also receiving a runtime 6 – overflow error.
Could I also send over the spreadsheet and you take a look please?
Hello Anshu,
Yes, if you email me an Excel file I will take a look at it.
Charles
Hello,
Thank you for making this awesome add-in. I am having troubles installing this. The add-in basically will not load when I start Excel. I have to go through options every time to bring it up again?
Thanks!
Noel,
Are you using Windows or a Mac? What version of Excel are you using?
Charles
Hi i have downloaded the Real-Stats for Mac but i am asked for a password while i am trying to install. Please help.
You never need to use a password. You simply need to perform the following steps:
1. Open Excel, but don’t try to open the realstats-mac.xlam file that you downloaded earlier.
2. From the Tools menu choose Add-Ins.
3. Make sure that the Solver option is checked on the dialog box that appears.
4. Check the XRealstats-Mac option on the dialog box that appears and click the OK button.
5. If the XRealstats-Mac option doesn’t appear, click on Browse to find and choose the XRealStats-Mac.xlam file that you downloaded earlier. Then complete step 4 as described above.
Charles
When using Real Statistics, every time I try to run a Simple Exponential Smoothing forecast Excel gives me a Microsoft Visual Basic error. I’ve tried to update my computer, enable macros but nothing seems to work. How can I fix a Microsoft Visual Basic error?
Hello Addison,
If you email me an Excel file with your data, I will try to figure out why you are getting this error.
Charles
Is it possible use real-statistics for “multiple Correspondence Analysis“?
Hello Claudio,
See Correspondence Analysis
for the CA supported.
Charles
n Mar 2018, Cold Storage started getting complaints from their Clients that they have been getting complaints from end consumers of the dairy products going sour and often smelling. On getting these complaints, the supervisor pulls out data of last 35 days’ temperatures. As a safety measure, the Supervisor decides to be vigilant to maintain the temperature 3.9 C or below.
Assume 3.9 C as upper acceptable value for mean temperature and at alpha = 0.1 do you feel that there is need for some corrective action in the Cold Storage Plant or is it that the problem is from procurement side from where Cold Storage is getting the Dairy Products. The data of the last 35 days is in “Cold_Storage_Mar2018.csv”
Which Hypothesis test shall be performed to check the if corrective action is needed at the cold storage plant? Justify your answer.
State the Hypothesis, perform hypothesis test and determine p-value
See my reply to your other comment.
Charles
Hi, Can you suggest any method for example two way annova in R for my following experiment.
Field 1: Group 1, 2, 3 and 4.
Field 2: Group 1, 2, 3, 4 and 5.
Field 4: Group 1, 2, 3 and 4
Hello Manoj,
I don’t use R, but you can do this using Excel. See
https://real-statistics.com/anova/
Charles
Hello Charles,
Is there difference between multivariable analysis and multivariate analysis? If yes please let me know the difference.
Getch.
I am not sure that the terms are used consistently by everyone, but often multivariable refers to multiple independent variables while multivariate refers to multiple dependent variables. This is the case for ANOVA vs MANOVA.
Charles
Dear Charles,
I am working on analysing experimental results.
I have one dependent variable and four independent variables as the following:
Y=A, B, C, D ; I first took the natural logarithm (Ln) of all of them, then i did multiple regression in excel for the values of the natural log, and I got an equation as the following: Y= A^a * B^-b * C^-c * D^d with R^2 = 0.96
Is my steps seem right? and Should i care about the units and maybe make them dimensionless? any other advice.
Thanks in advance..
Radi
Hello Radi,
How did you perform the regression on Y = A^a * B^-b * C^-c * D^d ? Did you then take the natural log of both sides?
Charles
Hi Charles,
I needed the equation to be in this form : Y=CONST. * (A)^a * (B)^b * (C)^c * (D)^d.
I first linearized the equation by taking the natural log of both sides as the following: Ln(Y) = Ln(CONST.) + a*Ln(A) + b*Ln(B) + c*Ln(C) + d*Ln(D).
Then I calculated the natural log for all the parameters (both the dependent and independent); ( Ln(Y), Ln(A), Ln(B), Ln(C), Ln(D)).
After that, I did the multiple regression using excel to get the values of the coefficients (Input y range is the values of Ln(Y)), (Input x range are the values of Ln(A), Ln(B), Ln(C), and Ln(D)).
Finally, I got the equation after the regression which looks like the following:
Y= CONST. * A^a * B^-b * C^-c * D^d.
Does it seem right to you? Should i care about the units?
Thanks,
Radi
This seems like a good approach.
Charles
Charles,
I recently upgraded to the current version. I followed the directions about installing the new version and uninstalling the prior version. When I start Excel and use Ctrl-M, Real-Statistics does not open although it shows as an add-in. If I uncheck the add-in and then return through File/Options/Add-ins, and check Real-Statistics as an add-in, it works. It’s clumsy and clearly not functioning as it should. Any thoughts?
Hello Brian,
In the list of add-ins is RealStats unchecked (or not present), while XRealStats and Solver are on the list and checked?
Charles
Dear Charles,
Thanks and congratulations for the statistics tool.
I am trying to perform a multivariate logistic regression model but I cannot manage it.
I was studying a parasitic disease in equids trying to evidence the relationship of different variables and the presence of antibodies in the animals, I performed a univariate analisis using an excel formula template and many of the variables resulted significant. The next step would be to perform a multivariate analisis including the significant variables to obsetve if they are still significant or not and to calculate the OR of each category. The dependent variable is dichotomous (presence of antibodies or not) and the other variables are categoric (some with more than 2 categories). I do not know how to set the options in the system as I have tried with multivariate and othe types of regression formulas and the results are not what it was expected and the odds ratio are missing.
Thanks in advance for your time and consideration.
Hi Leticia,
See Logistic Regression
Charles
Dear Professor Charles,
I got interested with the application you’ve made and started right away. I was doing a t-test independent samples as an exercise double-checking results from your application and with the SPSS.
Firstly, I checked for the computed equality of variances using the Levene’s test from the SPSS. My result shows that there is a violation, F = = 16.301, p = .000. I checked the homogeneity of variances for each groups and the result from the SPSS shows that my first group was significant, W(20)=.886, p = .023, and the other did not, W(24) = .966, p = .571. I double-checked the results from your application and got the same output as well.
Second, the results from the SPSS for the ‘Equality of Variances Not Assumed’ row is also the same with the T-Test Unequal Variances Two-tailed row from the application, t (27.030) = 2.535, p = .017. So far, I was doing it right.
Third, I was to report my effect size (among other things) using a Cohen’s d and since the SPSS can’t solve this, I decided to compute it with the following formula: d = (M1 – M2) / SQRT [(SD1 + SD2)/2] and got a computed value of 0.787655. My M1 = 42.55, M2 = 39.125, SD1 = 5.482, SD2 = 2.787, N1 = 20, and N2 = 24.
I used an online statistical calculator just to be sure that my manual calculations is correct (https://www.socscistatistics.com/effectsize/default3.aspx) and got the same computation (of Cohen’s d). The online calculator also produced a Gates’ delta value of 0.624796 and a Hedge’s g of 0.810733).
Because my computation was the same with the online statistical calculator, I double-checked it with the effect size in the result of the Real Statistics application. Under the Cohen’s column, the computed value was 0.810733. I noticed that this is similar to the Hedge’s g result of the online calculator.
My question is: does this mean that the Cohen’s d computation in the Real Statistics application sensitive enough to detect SD and sample size violations that the computed value (0.810733) is the same as the online calculator’s Hedge’s g ((39.125 – 42.55) ⁄ 4.224573 = 0.810733)? Does it mean that I should report it as Hedge’s g instead of Cohen’s d? Or they are just two different computations altogether?
Lost and confused,
Don
Also, what does the “Hyp Mean” pertains to and why the value is “0”?
Hello Don,
This is the hypothetical mean. It is used in the single sample test. It is set to zero, but you can change it to some other value.
Charles
Hello Don,
I don’t blame you for being confused. There seem to be a lot of different versions of d and g.
In any case, I am using the version based on s as described at
https://stats.stackexchange.com/questions/1850/difference-between-cohens-d-and-hedges-g-for-effect-size-metrics
the part beginning “It seems when people say Cohen’s d they mostly mean:”
Charles
Thank you very much Prof. Charles for the hyperlink. This helps a lot. I guess there are many things I haven’t read from effect sizes (the nuisance of the three indices). I’d better start taking notes again 🙂
Good afternoon,
I work in hydraulic engineering and don’t know much about statistics, a colleague recently referred me to your excel add-on.
The problem I’m working on involves identifying high and low points in the profile of a ditch. The current ditch design was approximated by software, and it shows its profile as a little “bumpy” or rough: by compare each (x,z) point with its adjacent 2 neighbors, we determine many high and low points very close together. Wheras what we want is essentially high point = “top of a hill” or low point = “bottom of a valley”.
Is there some statistical processing that would either help us filter out high/low points that are too close together, or smooth out the profile’s “bumpiness” so we only get reasonable high and low points?
Thanks!
Better yet, two low points cannot be too close together, and two high points cannot be too close together. A high point may be located near an adjacent low point.
Hi Sean,
This seems more like a calculus problem than a statistics problem, but perhaps I just am not familiar with this type of problem.
The only thing that i can think of use using a Spline curve and leaving out a point (x,z) that you believe might be too bumpy. The spline curve would give an estimate of the z-value (x,z’) at this point and then you can see how far away z’ is from z.
The calculation of the spline curve values is shown at the following webpage:
https://real-statistics.com/time-series-analysis/stochastic-processes/handling-missing-time-series-data/
Charles
Hi Mr. Charles Z.,
First, I want to thank you for this amazing website and all the useful information you share with the vistors. I need your help please. Im doing my bachelor degree thesis and I’ve been counting the number of birds along 13 km of coastline in Perú. The counts are made one per day twice a month and i already have two counts. The total number of birds of the first day is 156, and the total number of birds of the second day is 789. There is a big difference between the total number of birds. For both days the birds are always grouped in patches (that means that they are not distributed with homogeneity along the coastline). Looking at the data Ive noticed that, besides been different number of total birds, for each day the patches of birds allways keep the same proportions one each other. In fact, i would say that the relation between the proportions of one day is the same as the proportions of the other day. in that case, which statistic test can i use to prove if they are the same or not? Thank you very much.
Hello Sebastian,
I can’t say for sure, but I suggest that you look at
Proportion Distribution
Charles
Am I able to run one-way Anova from summary data in Real Stats? I am looking at a journal article 34(1), 32-44. (Yount et al., 2003. Journal of Pain and Symptom Management, https://www.jpsmjournal.com/article/S0885-3924(02)00422-0/pdf ). For post hoc comparison, the authors provide a table of n, mean, standard deviation, for each group. The table also includes Effect Size. I found several online calculation sites for Anova from summary tables, but did not get the exact values of the authors. They did not specify the software use for most of their analyses. I was able to get similar results, but not exact values, for their Anovas and post-hoc comparisons. The did state that they used Tukey HSD for post hoc comparisions, and did not specify adjustment for unequal group sizes. The oneline programs that I found (for Anova from summary data) used Tukey Kramer. Perhaps that is one reason my F values are somewhat different. Thanks,
Hello Annette,
If you have the data, you can use the Real Statistics software to perform a variety of post-hoc tests (Tukey HSD, Tukey Kramer, Games-Howell, etc.) and see which one was used. They tests are described at
Unplanned Comparisons
Charles
Hi Dr. Zaiontz,
I am working on a project to measure the agreement of fifteen Radiologists in the diagnosis of 6 major pulmonary diseases. The Radiologists read 50 chest x-rays and determined whether each were positive or negative for any of the following conditions: Cardiomegaly, Pulmonary Parenchymal Abnormalities, Pleural Effusion, PTX, Calcified Pleural Plaque, Destructive Expansile Bone Lesion or Free Air Below Diaphragm.
I downloaded the Real Statistics Add-on, but am not sure how to organize the data to in my spreadsheet to calculate Fleiss Kappa. My raw data includes the six diagnoses and the findings by the 15 raters(positive or negative) for each of the 50 subjects.
Any help would be appreciated.
Thank you in advance,
Ed
Note also that a single subject (X-ray) may have multiple positive diagnoses.
Hello Ed,
If there was at most one positive diagnosis per x-ray then you could use Fliess’s kappa (or Gwet’s AC1) using 7 rating categories, one for each disease and one for None. If you need to have more than one rating per x-ray then I can think of two approaches: (1) use separate ratings for each disease (as described in my previous response) or (2) you will need to figure out a way of coding the ratings that captures the fact that multiple diseases are possible; since there would be an order to such categories (e.g. Pleural Effusion + PTX > PTX), you couldn’t use Fleiss’ but you might be able to use Gwet’s.
Charles
Hello Ed,
You can calculate one Fleiss’ Kappa for each of the 6 diseases as described at
https://real-statistics.com/reliability/fleiss-kappa/
You can then combine the 6 separate measurements in various ways (mean, minimum, etc.), depending on why you want to use this measurement at all.
I suggest that instead of using Fleiss’ kappa you use Gwet’s AC1, as described at
Gwet’s AC2
Charles
Hi Dr. Zaiontz
This is a research project,
soil sampling achieved from an area had been under irrigation for a long time, and one nearby site (outside the irrigation boundary) representing the reference soil. The two categories thought to be comparable regarding parent material and topography. Accordingly, on each category, six soil pits (as replicates) excavated and sampled by horizons. Each soil pit consists of three horizons (A, B, C). Each soil sample from each horizon was analyzed for 15 soil physicochemical properties.
The objective is to inspect the impact of irrigation on soil’s properties compared with reference-unirrigated soil and quantitatively addressee the distribution of that impact on soil’s physicochemical properties along with the soil profile (three horizons A, B, C). I believe that in this situation I have the two sites and three horizons are independent variables and the physicochemical properties are dependent variables. Could you please tell me what type of statistical models is applicable in this case? Is the distribution of the variables I mentioned is right or not?
Thanks very much.
Khalifa
Hello Khalifa,
Since you are comparing multiple physicochemical properties, you probably need some form of MANOVA. Which specific test to use depends on the specific hypotheses you want to test.
Charles
Hello Dr. Zaiontz
The US Department of Labor’s Bureau of Labor Statistics publishes a Census of Occupational Related Fatal Injuries annually, nearly one full year after the end of the year under study. Ex. Occupational Fatalities in 2018 will be reported in December 2019.
BLS updated their classification systems and the current statistical series begins in 2011.
This means that there are only 7 observations in this time series (2011, 2012, 2013, 2014, 2015, 2016, 2017).
If there were a comparable annual time series (7 observations) that exhibited a strong correlation (R-sq > .96) to the BLS series, would a linear regression model be applicable given the small number of observations?
The regression model, if appropriate, would provide useful insights into the fatalities statistic immediately at the end of the subject year, rather than waiting 12 months for the official statistic’s release. Seven observations are all we have to work with.
I’m not trying to forecast a future period’s statistic, just develop an estimate of the prior period’s statistic without waiting 12 months.
Thanks very much.
Dave
Hi Dave,
Even with only 7 observations, the results may be useful. Since R-sq is so high, you tend to need fewer observations in any case (exactly how many depends on a number of factors — which are explained on the Real Statistics website), but in any case, if the preliminary prediction is useful, I would proceed and then check to see what problems, if any, was caused once you got the real info 12 months later.
Charles
Dear Charles,
I’m intersted in whether students use a certain type of explanation for something; let’s call it “the x-explanation”. So, I asked 360 students 4 questions of 2 different types, and I have the percentages (or the proportions) of the “x-explanation” to each of the 4 questions. I want to know whether the differences between the percentages (or the proportions) of the “x-explanation” to the 4 questions are statistically significant. In other words, I want to test differences between percentages of a type of response; and not differences between the respondents that gave this type of response.
What’s the statistical test I need to run?
Is it ok to use a t-test, since I’ve seen it used for this kind of problems in several phychology papers?
Thank you very much for your time. Your help is always valuable.
Maria
Maria,
I haven’t able to follow your explanation. Can you give me a simple example of what you are doing?
Charles
Hello, Charles.
I’m sorry I wasn’t clear enough. I’ll try to to do better this time.
My Q is: “Is “x-explanation” more often in children’s responses to “type A”-questions or to “typeB”-questions ?
I have the responses of each of 36o children to 2 “typeA”-questions and 2 “typeB”-questions.
So, I know that “x-explanation” is present to A% of the responses to “typeA”-Qs; i.e. I know that the mean proportion of the responses to “typeA”-Qs which include the “x-explanation” is e.g. 0,65.
I also know that the “x-explanation” is present to B% of the responses to “typeB”-Qs; i.e. I know that the mean proportion of the responses to “typeB”-Qs which include the “x-explanation” is e.g. 0,46.
What do I do to find if the difference is significant or not?
I’ve seen that many people do t-tests in similar situations. And they don’t mention anything about normal distribution.
Is it all right to have a t-test? Is it better to have the non-parametric version (Wilcoxon)? Am I in a totally wrong way?
Thank you very much,
Maria
Hello Maria,
1. I understand that your sample includes 360 subjects (children). Each subject gets two type A questions and two type B questions. You are interested in the number of x explanations for each subject to the type A questions and similarly for type B questions. I presume that for any subject, the number of x-explanations to the A questions is either 0, 1 or 2 (since there only two such questions). The situation is identical for the two B questions. I believe this is so unless a subject can make multiple x-explanations to each each question.
2. For each subject the difference in x explanations between the A and B questions can take values 0-0 = 0, 0-1 = -1, 0-2 = -2, 1-0 = 1, 1-1 = 0, 1-2 = -1, 2-0 = 2, 2-1 = 1, 2-2 = 0. Thus the “score” differences can take values -2, -1, 0, 1 or 2.
3. The problem you seem to be addressing is whether there is a significant in use of x explanations between type A and B questions. This is equivalent to determining whether the “score” differences (in item 2) is significantly different from 0.
4. This would usually be tested using a t test (actually this is a paired t test) provided the “score” differences is normally distributed. If not you could use a Signed Ranks test. Here we are treating the discrete values (-2, -1, 0, 1, 2) as if they came from a continuous distribution
Charles
I have been using excels exponential regression to plot and predict (via goal seek) the hours necessary for a casting to reach a certain temperature inside a mold. I wanted to add a 95% confidence interval to the model, to readily demonstrate how “good” the predicted time is. I read the demonstration on building/plotting confidence intervals for a linear regression model, but I was hoping you could give me some insight how one would handle the upper and lower confidence curves around the exponential regression curve. Is this something you have already addressed here for Excel, or can you point me in the right direction? Any help would be very appreciated.
Hello James,
If you use a linear model (by using a log transformation) then you can find the confidence interval of the linear model and then undo the transformation by taking the exponentials of the end point; i.e. if (a, b) is the confidence interval of the linear model, then (Exp(a), Exp(b)) can serve as a confidence interval for the exponential regression model.
Using a linear model for exponential regression is described at
https://real-statistics.com/regression/exponential-regression-models/
Finding the confidence interval for the linear model is described at
https://real-statistics.com/regression/confidence-and-prediction-intervals/
Things are more complicated if you use a non-linear model. I haven’t investigated the confidence interval for this model, although you can certainly use bootstrapping to find it.
Charles
Dear Dr. Zaiontz;
I sent a followup email, were you able to receive it?
Hello Ed,
Yes, I have received it.
Charles
Hi Charles,
I can’t seem to find the workbook containing the Iterative Proportional Fitting Procedure (IPFP) examples. I thought it was in the basics workbook but can’t seem to find it. I am trying to replicate the output from the formula in Figure 8 and can only replicate the value for cell H8 using the following formula: =IPFP2(B5:F8). What would be formula for cell H9?
Thanks
Hi Peter,
You can find these examples in the Regression 2 examples workbook.
Cell H9 is blank.
Charles
I was wanting to use the LASSO regression, but the menu only contains Ridge Regression. The output tables are different so one is not a subset of the other…or is it?
Hello Villi,
I have not yet implemented a data analysis tool for LASSO regression. There is a function, LASSOCoeff, which calculates the LASSO regression coefficients. See the following webpage for details:
https://real-statistics.com/multiple-regression/ridge-and-lasso-regression/lasso-regression/
Charles
Many thanks Charles…and compliments for this very worthwhile site.