What is Real Statistics Using Excel?
Real Statistics Using Excel is a practical guide for how to do statistical analysis in Excel plus free statistics software. This software package extends Excel’s built-in statistical capabilities. In particular, it will enable you to more easily perform a wide variety of statistical analyses in Excel.
What does Real Statistics Using Excel consist of?
Real Statistics Using Excel is comprised of the following four components:
Real Statistics Resource Pack
An Excel add-in that extends Excel’s standard statistics capabilities. It provides you with advanced worksheet functions and data analysis tools. This will enable you to more easily perform a wide variety of practical statistical analyses. The software supports Excel 2007, 2010, 2013, 2016, 2019, 2021, and 365 for Windows. It also supports Excel 2011, 2016, 2019, 2021, and 365 for the Mac. The current version is Rel 9.2.2 (released 21 October 2024). There is also limited support for Excel 2002 and 2003.
Real Statistics Website (i.e. this site)
- Lets you download a free copy of the Real Statistics Resource Pack
- Provides tutorials and descriptions of how to perform a variety of statistical analyses using built-in Excel capabilities. You will also learn how to use the supplemental capabilities provided by the Real Statistics Resource Pack.
- Presents numerous examples in the form of Excel worksheets which you can download to your computer
For the student and the novice, the Real Statistics website is an excellent tutorial for learning the basic concepts of statistics and how to do statistical analysis. For all users, it provides a step-by-step guide for how to do statistical analysis in the Excel environment and the tools necessary to carry out these analyses.
You can use this website to learn how to perform statistical analyses in Excel even without using the Real Statistics Resource Pack. But we recommend that you download the resource pack so that you can have access to its powerful capabilities.
Real Statistics Examples Workbooks
Consists of fourteen Excel files that contain all the examples shown on the website. These example files can be downloaded for free (click here). Each example focuses on a specific statistical concept and has been designed to demonstrate simple concepts before moving on to more complicated topics.Â
Over time, the webpages on the Real Statistics website are being updated so that you will be able to download an Excel file with any of the examples found on that particular webpage.
Real Statistics Community
Each webpage has a Comments section where you can make suggestions, identify errors, ask questions to others in the community, or request their advice.
You can also get updates about new releases and other information via the @Real1Statistics twitter feed.
How do I get started?
Step 1: If you elect to use the Real Statistics Resource Pack, click on the following icon. You will be given the opportunity to download and install for free the Real Statistics Resource Pack.Â
Once you have downloaded and installed the Real Statistics Resource Pack, you will be able to use the supplemental capabilities with your copy of Excel. This is described throughout the rest of the website and is summarized in Using Real Statistics Functions and Real Statistics Data Analysis Tools.
If you choose not to download the resource pack or examples now, you can do so later at any time.
Step 2: Browse through the website to learn how to perform a wide range of statistical analyses in Excel using standard built-in as well as supplemental Real Statistics capabilities. We suggest that you begin by clicking on the Website Introduction (and especially Organization of the Website). These will explain how to navigate the website to get the information you need to run any specific statistical test or learn about any particular topic.
Why do statistical analysis in Excel?
The reasons for choosing Excel are as follows:
- It is widely available and many people already know how to use it
- It is not necessary to incur the cost of yet another tool
- There is no need to learn new methods of manipulating data and drawing graphs
- It already contains some basic statistics functions and data analysis tools
- It is much easier to see what is going on since unlike the more popular statistical analysis tools very little is hidden from the user
- Excel provides the user with a lot of control and flexibility
This makes Excel an ideal tool for learning statistical concepts and performing some basic statistical analyses. Unfortunately, its built-in statistics capabilities are limited. As a result, many prefer to use statistical tools such as R, SPSS, or SAS for carrying out more advanced statistical analyses.
We created the Real Statistics Resource Pack to address these shortcomings. This software package contains various supplemental tools that enable you to carry out a wide range of advanced statistical analyses without leaving the Excel environment. You can download the Real Statistics Resource Pack free of charge from this website.
Dear Charles,
Thanks a lot for your contribution on such a powerful Excel package on statistical calculation. May I have a question to ask you? I notice the calculated variance on Anova analysis, but no standard deviation found. Usually, we need mean plus and minus standard deviation to represent a sampling group, and there is basic difference between variance and standard deviation. The standard deviation is the square root of variance. Could you reply this message and make necessary modification if necessary?
Have a nice day. Wish to receive your response.
Yidong
Dear Yidong,
Glad that you like the Excel statistical package.
If you look at Figure 2 of
https://www.real-statistics.com/one-way-analysis-of-variance-anova/confidence-interval-anova/
you see the standard error as well as confidence intervals for the means of each of the groups.
This seems to be what you are looking for.
Charles
hello,
i have tried to install the add-in for windows 365 running windows 11. when i initially install it, everything works great. however, when i exit the spread sheet and open a new one, i must uninstall and reinstall the add-in within the excel program so that i may compute my data.
Side note, i did not have this problem before updating from windows 8 to windows 11. any thoughts?
Hello Bryan,
I don’t know why this is happening. I upgraded from Windows 10 to 11 (using Office 365) without having any problem.
Have you reviewed the Troubleshooting section of the webpage from which you downloaded Real Statistics? Perhaps one of the security setting changes when you upgraded to Windows 11. Also, is Solver active?
Charles
Is there a limit on the number of rows this add in can process ?
For some data analysis tools, there are some limits, usually about 65,000 rows, but for most others, there are no limits.
Charles
Dear Charles,
Let me to tell you that the real-estatistics add-in tool is more than perfect. However, I’m interested to get the coefficient of the constant term in the case of fixed model panel data analysis and random model too. For some reason, the tool does give the coefficient of the constant term for the random model only. Your feedback is highly appreciated.
Dear Mahmoud,
In the fixed model, the intercept term drops out (i.e. it is zero). This is explained at the relevant webpages listed at
https://www.real-statistics.com/panel-data-models/
Charles
Dear Charles,
I have verified that the calculation of the adjusted standardized residual is not the same in the Linux version as in the Windows version. I think that in the Windows version the calculation is wrong. Can you confirm and, where appropriate, correct the error?
Thank you and greetings.
Hello Ana,
I don’t know what Linux version you are referring to. Also when you refer to the Windows version, are you referring to Excel or something else?
Can you give me a specific example where the Linux and Windows versions are different?
Charles
Charles, I wanted to say thank you for such a remarkable website and add-in. I and other Excel users have been asking for decades for Microsoft to enable spline and eigenfunctions and to no avail. Your add-in offers these in addition to many more linear algebraic functions and of course a plethora of additional statistical functionality. In the past, I had to build all of these things by hand and it was both tedious and time consuming. Your add-in saves me countless hours of labor and allows me to focus on the analysis instead of the mechanics of implementing that analysis. You have done a great service to the engineering community.
I just wanted to say thanks.
Hi Patriot,
Thank you for your kind remarks.
I have recently added various root-finding functions, as well as functions that find local and global maxima and minima.
Charles
Hello Charles. I am greatly impressed by the kind of academic work you do. I would wish to do a consultation from you as soon as you read this commend. Thank you.
Hello Samuel,
I will respond to you via email.
Charles
Hello, through googling, I found your website and your tool Real Statistics. I haven’t yet downloaded, but considering it. I use Excel quite a bit, and years ago have used SAS to build some binary outcome logistic regression models. I am currently helping a non-profit in building a scorecard, but they have very thin data to build models on (I may only be able to use less than 100 records to do so). Do you have, anywhere, an illustration of how Real Statistics in Excel handle thin data in building a logistic regression model? I’d like to see how easy it is to handle before going through the installation and learning how to use the tool to develop/train the scorecard. Thank You.
Hello,
If by thin data, you mean data with few records, then yes, the examples on the website tend to be of this type. See
https://www.real-statistics.com/logistic-regression/real-statistics-support-for-logistic-regression/
See also Figure 2 of the following webpage for an example with 49 records.
https://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/
Charles
Good day Charles,
I am having trouble downloading excel on my computer, is there a possible way i can use real stats in the online version of excel?
This program is required for university.
Luisa,
No, Real Statistics won’t work with the online version of Excel.
Charles
Hello Charles, Thank you for this great tool.
I want to second some posts in this page about LibreOffice, just metioning it as it is the most complete and compatible version.
https://help.libreoffice.org/6.2/en-US/text/sbasic/shared/vbasupport.html
As you said, the VBA support is not complete, but it is very comprehensive.
If you may consider to try, see the warning and eventually ask for someone to contribute.
You may disable functions that are not able to run in libreoffice and show a warning while the user could still continue to use the others.
Check https://ask.libreoffice.org/t/detect-if-vba-is-running-under-libreoffice-or-excel/3841
I know it requires time but It seems to be just one step away.
Kind regards,
I imagine that I could get many of the Real Statistics features to work in LibreOffice, but I am sure that it will take a lot of development time to make this happen. So far I have received a few requests for Libre support but not that many, and so I have my doubts as to whether it is worth the effort.
Charles
Hi Charles, are you ok?
I’m use the real statistics a much time and i have one question: there is a possible a have one version for Calc (Libreoffice) ? Because corrently, Calc (libreoffice) is more similar at Excel, and is a good alternative for excel.
Tanks for all, and I admire know work
Hi Menezes,
I am doing ok. I hope that you are also doing well.
Real Statistics is written using VBA for Excel. My understanding is that VBA support for Calc is a little different and is more limited. For this reason, Real Statistics can’t be used for Calc at least in its current form.
Charles
Dear Charles,
I just downloaded the Pack, this message keeps popping up
“424 runtime error object required”
Could you help me how to fix it?
Hi Kim,
You need to install the Pack using the instructions on the webpage from where you initiated the download.
You can’t simply double-click on the file that you downloaded.
Charles
Thank you for such a wonderful and comprehensive set of statistical tools.
It would be most helpful if there was a list of statistical tests by name – Levene, Box …etc in alphabetical order – stating on which menu item (Ctrl m) that test could be found.
The Tools menu on the website contains a summary of the data analysis tools and functions with links to further information.
Charles
Good afternoon
When I try to perform an analysis in excel using Real Statistics, the excel stays stuck or frozen on my pc, I can do other actions or work on my pc while that happens, but the excel keeps frozen or loading all the time. How can I solve it?
Thank you
Hi Nancy,
There are so many reasons why Excel could freeze in this way, some related to the Real Statistics add-in and some that are not related, that I don’t know where to start without getting more information about the problem. A couple of comments:
1. What version of Excel and Windows are you using? What version of Real Statistics are you using?
2. Have you looked at the Troubleshooting section on the webpage from where you downloaded the Real Statistics add-in? It may be helpful.
3. Did you double-click on the file containing Real Statistics? If so this may cause a lot of problems. THis is not the correct approach for installing Real Statistics.
Charles
Good afternoon
Thank you very much for the answer, I didnt double click on the file containing Real Statistics and I also tried what the troubleshooting section said, but it didnt work.
I changed the office version many times to see if that was the problem but it didnt work either, for the moment I left office 2016 and I have windows 10 on my pc.
Dear Charles,
OK, nice web page upgrade, etc. But I loved the pastoral scene on the old web page. I told my students it was the driveway to your home in Italy. (And that they should fly to Europe to visit “Uncle Charles” and pay their respects!)
Thank you for your fine work. BTW, I raise stupid but lovable White German Shepherds. Tell the Mrs. that they all need psychological help.
Dear Mark,
I too miss the pastoral scene, but I needed to make the change to a new template. Your students are welcome to come down our driveway and visit us here in Italy.
We have three Golden Retrievers, all of whom are therapy dogs. They usually are involved in treating people, but they may be helpful in helping other dogs. BTW, you can visit my wife’s website for more information about her pet therapy activities.
DogBliss
Charles
Hello Charles,
What statistic test you can recommend to me to prove that 5 lines are significantly different. The 5 lines were made considering the same independent values X for all. I was expecting to find different dependent values Y in the 5 lines because to make these lines one parameter was different to obtain different dependent values Y, but they are almost similar. I was trying to prove that the parameter used affects the value of Y, but it doesn’t because the 5 lines are almost the same.
Thank you
Hello Charles
To consider the 5 lines are significantly different, how much different do they need to be?
See my previous response.
Dear Charles
The Y values are the air flows using different solutions to make the bubbles run a constant volume in a universal calibrator (using an inverted burete), I measured the times and dividing the Volumen/Time = Q = Flow(lpm). what statistic test can I use to demonstrate that using different solutions impact the result of flow of the calibrator.
Flow (lpm) =Y
Flow (lpm)=X Grupo 1 Grupo 2 Grupo 3 Grupo 4
1 1.156 1.12 1.14 1.198
2 1.714 1.93 1.9 1.913
3 2.614 3.07 2.71 2.972
4 3.358 3.75 3.23 3.452
These values
Flow (pump) = X Flow (calibrator)= Y
lpm lpm
G-1 G-2 G-3 G-4 G-5
1 1.156 1.12 1.14 1.198 1.1696
2 1.714 1.93 1.91 1.913 1.9429
3 2.614 3.07 2.71 2.972
4 3.358 3.75 3.23 3.452
G-1,G-2,G-3, G-4, and G-5 are the result of the calibrator using different solutions to make bubbles in the calibrator, G-5 is the standard solution. In the axe X is the values of flow of the pump. If the values of Y using the different solutions are equal to the G-5 so the use of different solutions in the calibrator do not affect the result of it. With the values I have, What statistic test can help me to proof that the use of different solutions to make bubbles in the calibrator do not affect the result of the calibrator?
Carlos,
I need more information in order to answer your question.
Charles
Dear Charles,
To calibrate the flow of the air pumps we use a primary calibrator, in this case, we are using the bubble film flowmeter, one side of the burette settled on a bubble solution, and the other top side connected to the air pump. When the system is working film bubble runs from the bottom to the top. The film bubble runs a determined volume at the determined time. To obtain the measured air flow with this calibrator we divide the volume/time, we repeat the same process ten times and we use as flow the average of the ten flows measured. In normal conditions, we need to use the manufacture bubble solution, but sometimes when it is done or is not available in Peru technicians who do air sampling use available alternatives such Shampoo, liquid soap, liquid detergent, and those alternatives do the same job as the manufacture bubble solution. But those alternatives have different viscosity and densities that can make the flow measured with them be different from the flow obtained with the manufactured bubble solution. The investigation project tries to demonstrate if the use of different alternatives to the manufacture bubble solution affects the result of the measured flow by the primary calibrator.
In the project, we measured 4 flows (1, 2, 3, and 4 lpm) and we use 5 different solutions to make bubbles, Shampoo Johnson & Johnson, Shampo Head &Shoulder, liquid soup, liquid detergent, and manufacture bubble solution.
What test can you recommend to demonstrates that the use of alternative solutions affect the result of the primary calibrator.
G-1 G-2 G-3 G-4 G-5
(1 lpm) 1.156, 1.12, 1.14, 1.198, 1.1696
(2 lpm) 1.714, 1.93, 1.91, 1.913, 1.9429
(3 lpm) 2.614, 3.07, 2.71, 2.972
(4 lpm) 3.358, 3.75, 3.23, 3.452
G-5 is the manufacture bubble solution
Hello Carlos,
I don’t understand your table. Some rows contai5 5 entries and some 4 entries.
It looks like a one-way ANOVA problem, but I don’t really understand the scenario well enough to say for sure.
Charles
Dear Charles
I would like to ask you how I can perform a normality test for a single sample.
I have tried to find the command but I cannot do that test.
Thank you
EspecÃficamente la prueba de Kolmogorov-Smirnof
See https://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/kolmogorov-smirnov-test/
Usually, the Lilliefors version gives better results. See
https://www.real-statistics.com/tests-normality-and-symmetry/lilliefors-test-normality/
Charles
You need to have the Real Statistics software installed to use the various functions and the data analysis tool. See
https://www.real-statistics.com/tests-normality-and-symmetry/
Charles
Hi
could you help me?
I have an independent variable (two experimental and control groups) and five dependent variables, and when testing the normal distribution in order to use the Manova test, it was not valid for the five dependent variables separately, what can I do?
See https://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-assumptions/
Charles
Hi Charles,
I have an interesting question and hope you can help.
Here are the Monte Carlo Steps to mimic in closed form:
a.) Pick Alpha Value (i.e. 0.00135 = one sided -3 sigma 0-1 area value for 2700 DPPM or 99.73% Acceptance)
b.) Select ‘n’ random points X1, X2, ..Xn from a standard Normal Distribution (Box-Muller or Norm.Dist or Abramowitz & Stegun algorithm)
c.) Let I = alpha * n + 0.5
d.) Let I1 = Int(I) and let I2 = I1 + 1
e.) Left Tail Percentile Value = X(I1) + (I-I1)/(I2-I1) * [X(I1+1) – X(I1)] ‘Linear Interpolation
f.) Repeat process ‘R’ times – of creating and sampling ‘n’ Normal points (R must be large)
g.) Determine the 95% confidence points for the distribution of ‘R’ Left Tails for the value of ‘n’
h.) The Monte Carlo result showed Left Tail 95% limits for R to be (-3.2707, -2.7716) for R=10^6 and n=10^4. This means a single sample of ‘n’ points will likely produce a point estimate value between the two values in parentheses.
Now the question:
A few web articles claimed the above limits can be found using Non-Central t formulas, so how can such formulas be used to mimic the two 95% confidence limits above and plot the -3 Sigma Percentile PDF these tails come from?
This has amazing implications because I found it takes only 1K Monte Carlo pts for the Mean to have 95% Confidence limits less than +/- 1% of span( or 6), BUT . . . it takes 1.7K pts for Median, 6.5K for Mode, 7.5K for Z=1.96 Tails, 19K for Std Dev, 23K for PPk, 75K for the 2700 DPPM tails discussed above, 2*10^6 for DPPM.rss (Mean+/-3*StdDev), and 15*10^6 for for DPPM (# pts outside of -3 and 3 for a Std Normal) and I’d like to have closed form solutions for Tails like I have for Mean, Std Dev (Gamma) and Variance (Chisq). Median, PPk, DPPM are future endeavors assuming a solution exists.
I found no articles warning people how many Monte Carlo runs it takes for 1% accuracy for DPPM which is the ultimate reason for running Monte Carlo in Design and Manufacturing companies for tolerance analysis (for Tails against a Spec for a good DPPM like 1000 or lower). Many software providers claim 1000 runs is OK, when this is only good to find the Mean which is usually already known due to closeness to Y=F( X) using Nominal X’s. Any help would greatly be appreciated since I found your site to be the most “common sense” and “get it done” oriented. Thanks.
Hello Bruce,
Thank you for your kind words about the Real Statistics website.
From your comment, I understand that the simulation that you are proposing is based on the non-central t distribution.
The Real Statistics software supports the non-central t distribution. In particular, it provides the inverse function NT_INV(p, df, ncp).
For any given value of df and ncp, you can obtain a random value from the stated non-central t distribution by using the formula
=NT_INV(RAND(), df, ncp)
You can then generate as many random values as you like and estimate the desired parameter(s) and get confidence intervals. I don’t know a priori how big a sample is required to achieve the accuracy that you are looking for, but the good news about Monte Carlo simulations is that you can estimate the confidence interval and with a little experimentation you should be able to make an educated guess to the sample size required.
I don’t know of a closed-form solution, but perhaps the Real Statistics function =NT_DIST(x,df,ncp,cum) could be useful.
Charles
Hello Mr. Zaiontz.
There is any tutorial for the Tests of Conditional Predictive Ability in excel?
There is support for the Diebold-Mariano test
https://www.real-statistics.com/diebold-mariano-test/
Charles
Thanks you!! It will be very usefull for me.
Another question, there is any suppot for Giacomini-White Test?
Hello Bernardo,
Sorry, but presently the Giacomini-White test is not supported by the Real Statistics software. The Diebold-Mariano test is supported.
Charles
Hi charles 1st want to thanks you for Building such great platform for Statistics lovers.
I have a little quick Question Autually I am Stock Analyst and I am moving toward quantative learning using diffrent Statistical approches I want to get more accuracy on my stocks data Mg question is Can you please recommend me some best Appproches/Methods Regarding
Forecasting
Risk measurments
Voltality,Beta,heding,distibutions etc
please reply me your best thoughts on it what methods you think best according to requirments.
I will search out oj them on your site.
thanks a lot again
God bless you for what you are doing.
looking forward for the response
Kind regards;
Jude
Hi Jude,
Thank you for your kind words about Real Statistics.
The three topics that you identified are quite broad and so it is difficult to point you in the right direction
– Forecasting
– Risk measurements
– Volatility,Beta,heding,distributions etc
In any case, the Real Statistics website has a wealth of information about forecasting, especially using regression and time series analysis. See, for example:
https://www.real-statistics.com/regression-models/
https://www.real-statistics.com/time-series-analysis/
I would need clarification about what you mean by “risk measurements”, although these may not be covered on the Real Statistics website.
I don’t completely understand the third topic. I don’t know what “heding” is. Distributions are covered at
https://www.real-statistics.com/distributions/
I don’t know whether “beta” refers to the beta distribution, in which case it is covered at
https://www.real-statistics.com/beta-distribution/
Charles
Hi Charles. I just discovered your page. An excellent piece. I am learning how to teach statistics. Thank you very much for your effort and sharing.
Best regards,
Raihan
Kuala Lumpur, Malaysia
I have an excel sheet with subject information (age, gender, etc.) and some attributes (about 25) for each subject. I want to analyse the data and classify the subjects based on a variety of attributes. Which analysis is best suited to such data, and how should it be carried out? I read on here that ANOVA is good in such circumstances, and I ran it on Excel, but I’m not sure how to take conclusions from it. I need some help with this.
There are many ways to analyze your data. Which one is best depends on what you want to accomplish, shoe or explain. This might consist of hypotheses you want to test, predictions you want to make, etc.
Please explain further what you want to accomplish. Try to be as specific as possible.
Charles
Thank you for responding  Charles,
I have data for two subject groups and have extracted 22 features from them. I want to compare these features for the two groups and be able to offer a relationship between the groups and features to an ML algorithm for training a classification dataset.
I don’t know which classification approach (requiring an ML algorithm) that you are referring to. Can you give me a reference to the approach that you want to use?
Charles
I performed a gage r&r with the exact same examples provided and it’s giving me totally different total gage r&r results using anova, following the exact instructions provided. Is there something wrong with the calculations?
Hi Chris,
If you email me an Excel file with your analysis, I will try to figure out what is going wrong.
Charles
Hello,
I’m trying to get my head around my a priori power analysis for a series of multiple regression models I will be running, and i’m not sure if I’m getting it right. I want to achieve 0.95 power, with alpha set at 0.05 and an observed effect size of 0.5. What would my N be? All my calculations using GPower are coming out very low, and i’m not sure i’m doing it right-any guidance would be appreciated
Many thanks in advance.
Hello Lauren,
If you have 3 independent variables and you are using Cohen’s f-square effect size, then n >= 39.
What sort of answer are you getting?
Charles
Hi,
I have 8 independent variables, and am getting N>=54. I’ve never done a power analysis before, so I wanted to check in on the right track, as it seems low? I hope to recruit many more than that, so if that means I detect a smaller effect size, then that’s great.
Thanks so much for your help and great website.
If you only want to be able to detect an effect size of .5, then a sample of size 54 is sufficient. If you need to detect a smaller effect size, then you will need a larger sample. E.g. to detect an effect of size .2 (or higher) you need a sample of at least 122 elements.
Charles
Ugh.. cant download.
Downloading is a function of your browser and not Real Statistics. I suggest that you try again.
Charles
Really do like your website and the new look.
Jeff,
Thank you for your feedback and support.
Charles
Hello Charles,
First, thank you for your website!
I am working on the comparison of 2 methods with Passing & Bablok regression . I compare the results of the regression line and the CI (95%) by the software MEDCALC, R and Excel RealStat (with ressource pack and with Excel formulas). I have the same results for the regression line but not for the CI. The results according to MEDCALC and R are identical but not by XRealStat. Very close but not the same.
What are these differences due to? I don’t think there are any errors in my calculations (I hope!)
As the equation for the slope Sij is the same for all 4 methods, the good results of the CI are in my Excel table, right?
Could I apply corrections to Excel calculations to compensate for these differences?
Best regards
Hello Fabien,
Thanks for bringing this issue to my attention. How did you do your calculation of the CI? If Real Statistics has an error, I would like to correct it.
Charles
What a fantastic resource! This is excellent, and very easy to use. It is extremely generous of you to share your knowledge and skills for free. Thank you very much indeed.
Ian,
Thank you for your kind remarks.
Charles
Hello,
Please can a person use ppmc to correlate two variables(data) that are in percentage?
What is ppmc? Is it French for the least common multiplier?
Are you assuming that the percentages are whole numbers such as 56% and not 56.7% or pi%?
What makes you think that ppmc would be a good way to measure correlation?
Charles
PPMC means pearson product moment correlation. I will assume the percentage to be whole number
Sure. You can use PPMC for correlation between two sets of percentages, whether whole numbers or not.
Charles
I like using your add-in for MS Excel. However, MS Excel is not free compared to LibreOffice Calc. Do you have add-in for LibreOffice Calc also?
I use LibreOffice for my personal device at home and Excel at work along with company provided device.
Real Statistics uses VBA. My understanding is that LibreOffice Calc does not provide full support for VBA and so it is likely that it won’t work in this environment. You can try it out and see whether it works.
Charles
Hello Prof. Charles,
I have a question for Multivariate normal dist. and estimating the Beta1. I have a covariance(variance) matrix of x(endogenous), e(error term) ,z1(instrument 1),z2(instrument2). z1 valid instrument because it is not correlated with error term. z2 is correlated with error term.
The variables follow multivariate normal. and I have their cov-variance(matrix)and means. I have to simulate the model 1000 times with betazero=1 and beta1=7 and generate data. . I have to estimate the model with instrumental variable;
i-first estimate the regression with z1 instrument
ii-just z2 instrument
iii-both of them (z1+z2)
also I have to plot densities estimates of b1 corresponding to I,ii,iii in one figure. We are using Rstudio . And I have used the mrvnorm function using the matrix their mean and N=1000 and estimated x z1 z2 and e. and after that I have estimated the function by using b0 and b1. y= b0+b1*x+e and I got estimation of y’s. after that I have used the instrumental variables and I got 3 different b1(betaone). But I dont know how can I do plot densities. Because I just have 3 different numbers. i need kind of b1 function for each b1 but dont know how to do it? or am I doing a mistake?
yi =β0 +β1xi +εi, where the variables follow a multivariate normal
(x ,z1 ,z2, ε)~ N((0,0,0,0),( 1.0 0.5 0.7 0.7
0.5 1.0 0.4 0.0
0.7 0.4 1.0 0.3
0.7 0.0 0.3 1.0). Here is Var matrix and mean vector.
Best Regards
I don’t use R or RStudio and so I am not able to answer your question. In the Excel environment, you can generate random values from a multivariate binomial distribution as described at
https://www.real-statistics.com/multivariate-statistics/multivariate-normal-distribution/random-multivariate-normal-vectors/
Charles
Hello, do you have any suggestion on how to cite the tool in a method section of a scientific paper?
Something like “Data was analysed using the excel add-in Real Statistics Using Excel Resource pack.”?
Thank you
You can use this approach. See also
Citation
Charles
Dear Mr. Charles,
How do I compute for the true score (using Rasch model) of a subject (person) who answered a 5 point Likert survey?
I already run the Rasch Item Response Analysis using your software, and it already showed the ability of the subjects as well as the difficulty of the subjects. Now, I’m interested in knowing how to compute for the scores of individual subjects based on the model.
Hope you could help me with this one.
Thanks
Kim,
I haven’t forgotten your comment, but haven’t had the time to find the example that I prepared months ago that addresses this issue. I hope to respond to your comment shortly.
Charles
Hello Kim,
I have been thinking about your question for a couple of weeks now, but have realized that perhaps I haven’t fully understood what you are looking for. In particular, let me clarify what you mean by “how to compute for the scores of individual subjects based on the model”. By subject, I assume that you mean a subject that was used to create the model. If so, then the scores are already part of the output from the model. Which scores are you referring to?
Charles
Hi Charles – I just stumbled across your site. You have a lot of great resources here, thanks for sharing! Do you use an add in for monte carlo analysis or do you build custom in Excel?
The Real Statistics add-in provides a number of custom Monte Carlo simulation capabilities.
Charles
I am getting a runtime error: ‘unable to set the Horizontal Alignment Property of the range class’
The tool was working fine for several analyses until now. Any idea how to fix this?
Deepa R,
Which data analysis tool are you using when you get this error?
If you email me your data, I will try to replicate the error so that I can fix it.
Charles
what is your email address? I am using ‘logistic regression’ on the Real Statistics add-in
See Contact Us
Charles
Dear Charles,
In the Mann-Kendall Test you can include only one column with the data of the time series. Unfortunately, you are not allowed to include another column with the time associated to each data in the time series. Therefore, this technical approach assumes having a time series of data evenly spaced in time without gaps or missing data in between, which is very uncommon in environmental sciences. How can I tackle with such incomplete and also uneven time series in Real Statistics? Thanks in advance!
Jorge,
See https://www.researchgate.net/post/The_applicability_of_Mann-Kendall_trend_test_for_unequally_spaced_data_in_time
https://pure.unipa.it/it/publications/effects-of-data-gap-on-the-capability-to-detect-trend-in-hydrocli-2
Provided the “missing” data is random, I would think that the MK test is still valid, but see the above links.
Charles
Dear Charles,
I have just downloaded the Real Statistics Resource Pack. I am trying to run an ANOVA test in excel. Once I click ok this message keeps popping up:
“Compile error in hidden module: ‘Address’. This error commonly occurs when code is incompatible with the version, platform, or architecture of this application.”.
I am not sure what I am supposed to do.
Hello Linda,
See Compile Error
Charles
Dear Charles,
Just want to thank you very much for your efforts and patience developing the pack, and for kindly share your knowledge and insights with all of us.
Take care
Polo
My study focuses on the population structure of a plant species. Therefore, I have data on the diameter of all the individuals of this species.
And my framer suggested the Natural break method for data analysis but I don’t know how to do it.
Santianna,
See Jenks Natural Breaks
Charles
Dear Charles,
I have a study with a 2 x 2 factorial design, and the data won’t fit criteria for an ANOVA or other parametric analysis. Despite the 2 x 2 design, the real hypothesis is that one of the 4 groups in particular will be significantly different from the other three. Is it permissible to use Kruskal-Wallis for this analysis?
Thank you for your help. You run an amazing service.
Charles,
It depends on which ANOVA assumptions haven’t been met. If you make the 2×2 design into a one-ANOVA design (with 4 levels for the one factor) then Kruskal-Wallis or Welch’s ANOVA (if homogeneity of variances assumption is not met) might be right.
Charles
Thank you very much for your helpful advice above, but now I have a very different question.
I have a Mac with Catalina. I have been carefully following the directions for installing the Real Statistics Resource Pack on a Mac. The download works fine, but when I follow the rest of the instructions ending by clicking on the RSRP file, nothing happens. Do you have any tips? Is there any “trick” that is not in the printed instructions?
Thanks again!
Charles
Charles,
You shouldn’t click on the Real Statistics Resource Pack file. The installation instructions are given at
Real Statistics Download Mac
Charles
Charles,
You shouldn’t click on the Real Statistics Resource Pa k file. You need to follow the installation instructions at
Real Statistics Download Mac
Charles
Do you have a VPAT (Voluntary Product Accessibility Template) or ACR (Accessibility Conformance Report) for 508 compliance on this resource pack?
Sorry, but this is not part of the resource pack.
Charles
Ok, thank you. In order to use your resource pack for our research data, I am required to obtain a VPAT/ACR. I guess I need to find an alternative.
Dear Charles,
I have 27 sample trees. My data are non-normally distributed. So, I need to perform non parametric tests. I want to see the effect of two factors. Factor 1 is treated trees (n=13) and untreated trees (n=14). Factor 2 has 4 levels; North, East, West and South. Do you suggest, should I do Friedman Test or if thereany other test?
What hypothesis or hypotheses are you trying to test?
Charles
Dear charles
I have three groups of sample, size of sample is 5 to each groups. Please provide me easy steps to calculate the Anova.
Thank you
See https://www.real-statistics.com/one-way-analysis-of-variance-anova/
Charles
Dear Charles,
I’m trying to calculate a sample size needed for a study that measures the agreement between two raters. The raters score 3 variables consisting of two, three or four possible answers respectively. Zero hypothesis is: there is agreement (correlation coefficient of 0,81 or higher. Or there is no agreement (in our case 0,41 or lower) Which sample size calculator is suitable for this study design?
Eric,
I expect that the sample size depends on the metric that you use to measure agreement. Also what happens if the measure is between .41 and .81?
Charles
Dear Charles,
Thank you very much for your concise, accurate and extremely helpful blog. My question is whether I can cross-validate multiple logistic regression?
Yours faithfully,
Anatoly Lichtenstein
Anatoly,
Sure. You can use Cross-validation. See https://www.real-statistics.com/multiple-regression/cross-validation/ for more details.
Charles
In statistics specifically correlated groups T test and how they can be violated without serious errors being introduced into test.
What might be some exams of serious errors?
If you use the paired t-test, then you don’t need to worry about violations. See the following webpages:
https://www.real-statistics.com/students-t-distribution/paired-sample-t-test/
http://www.bwgriffin.com/gsu/courses/edur8131/notes/Notes5c_paired_ttest.pdf
Charles
Dear Charles,
I wonder if you can help my with my analysis. I have a field experiment trying different varieties of a plant species; I have 2 bocks and in each block 340 varieties, and I have observations for different characters for 10 plants of each variety in each block. I am interested in the variance not just between the blocks, the different varieties and characters but also within each of the varieties (as they have different levels of homogeneity). For such a case what type of analysis of variance would be more appropriate? Additionally I have missing data as in some accessions as plants died and there are therefore no observations for all the plants.
Many thanks,
Have a nice weekend,
Tony
Tony,
What hypothesis or hypotheses do you want to test?
Charles
I want to know if there are significant differences between the blocks and between the varieties for the different traits (eg: days to flowering, plant height)
In addition, it would be nice if it’s possible to partition know the variance within the accessions.
Hello Tony,
If you are comparing means, then RCBD could be the approach (depending on the details). See
https://www.real-statistics.com/design-of-experiments/completely-randomized-design/randomized-complete-block-design/
This webpage also links to handling missing data (using two different methods).
As far as variances are concerned, you can test for homogeneity of variances using Levene’s test, but I am not sure if this is what you are looking for.
Charles
Dear Charles,
All the the statistics tools I tried are OK with the exception of the Poisson regression (#valor! is shown). It did not work even with your examples workbook.
Thank you for your attention and congratulations.
Excel 2010 (XRealStats.xlam)
Reinaldo,
If you email me an Excel file with the Poisson regression data and results, I will try to figure out what went wrong.
Charles
I’m having problem when doing an ARIMA analysis. When I uncheck the box that says “Constant included in the model” the function keeps reporting me a value for the constant. How can I fix this?
Jorge,
When I do this I see a zero value for the constant.
If you send me an Excel file with your data and results, I will try to figure out why you are getting a non-zero constant value.
Charles
Hi Chalres,
You can download my Excel file in this link:
https://drive.google.com/file/d/1t1QIHNEDVxlG2x7K5FwdngiccvL8e4xA/view?usp=sharing
Charles, have d/l the add-in file and get it started. I have done the check you’ve mentioned {=Ver()} and it works -> 7.2 Excel 2010/2013/2016/2019/365 Windows; however the add-ins tab doesn’t show. However, every time I start Excel 2019, I have to click on the add-in to initiate it. Any suggestions?
See https://real-statistics.com/appendix/faqs/disappearing-addins-ribbon/
Charles
Just I have downloaded Real Statistical Resource Pack. But I cannot find cross correlation analysis tool in the “Time S” menu. what’s wrong with me?
What do you see when you enter the formula =VER() in any cell?
Charles
5.11 Excel 2007
I have solved this problem~~. I have downloaded wrong version package! thanks a lot
Cross Correlation is supported in a later release of Real Statistics. Because Microsoft no longer supports Excel 2007, I decided to freeze the Excel 2007 version of Real Statistics at an older release which doesn’t support Cross Correlation. If you have Excel 2010 or later you can use the latest version of Real Statistics.
Charles
Hi,
Thanks for the great add-in and website.
Why am I getting a popup message that says “Correlation – To indicate how groups of data are arranged, use “C” for columns, R for rows”? When I press OK or Esc, a single output is produced, then it pops up again, then OK or Esc, the next output and again this message, etc. How to avoid this?
Thanks
Hello Vahe,
The popup might be Excel’s Correlation data analysis tool. Did you open a large Excel file (e.g. one of the Real Statistics examples workbooks)?
Charles
No, not really a big one.
I did open about 30 spreadsheets in the process though, trying different correlations and having the outputs created on separate spreadsheets. Does that somehow affect the situation?
I don’t know for sure. Something similar happened to me a few months ago when I tried to run Real Statistics while a big workbook file was opened. It hasn’t happened since.
Charles