Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click hereÂ
Dear Dr Zaiontz,
Many thanks for a ADF and Engle-Granger co-integration tests ! Is there any chance for Johansen Implementation as well ? One of the advantage of this test is that it allows to analyze more than 2 time series. I would greatly appreciate your feedback.
Cheers,
John
Hi John,
I will add this test to my list of possible future enhancements.
I can see the advantage of adding this test, but I need to prioritize it against other enhancements.
Thanks for your suggestion.
Charles
holle sir,
I am a Chinese student, which means my English might not very well. First, I am really glad to find this website and it is useful for my study. But, when I studied how to use Excel to do the Diebold-Mariano Test in the website, I think I found a small mistake. Maybe it was my fault, but you can have a cheak.
The mistake is in the part of Example 1. The example shows that the cell I4 contains the array formula
=SUMPRODUCT(H7:H$23-G$25,OFFSET(H$4:H$23,0,0,E$23-E6)-G$25)/E$23
But I think the array formula might wrong. It might be
=SUMPRODUCT(H5:H$23-G$25,OFFSET(H$4:H$23,0,0,E$23-E4)-G$25)/E$23
or the original array formula belongs to cell I6.
Thanks for reading my reply!
Hi Lucy,
Yes, you are correct. Thank you for catching this error. I have now corrected the error. I appreciate your help in improving the quality of the website.
Charles
Hello Sir,
I want to perform Mann-Kendall test and Sen’s Slope (Non- Parametric test) using Real statistics for trend analysis of weather parameters using last 30 years daily data. But i didn’t get this tool in data analysis library of Real statistics (Add-Ins). Only the procedure shown on site and an example workbook is there related to Mann-kendall and Sen’s slope. Please guide me that how i find that tool in data analysis library of Real Statistics.
Hello,
You can use the Real Statistics MKTEST and SEN_SLOPE functions as explained at
https://real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/
https://real-statistics.com/time-series-analysis/time-series-miscellaneous/sens-slope/
They have not yet been implemented as data analysis tools.
Charles
Thank You Sir.
If you have any idea to implement Mann-Kendall test and Sen’s slope using a model or other way, please suggest. I will wait for the above test soon added as data analysis tools in the Real statistics Library. Waiting for your valuable suggestions.
Ankur
Hello Ankur,
The two webpages show how to implement Mann-Kendall Test and Sen’s Slope in Excel. But I suggest that you use the MKTEST and SEN_SLOPE functions. You don’t need to wait for the data analysis tools.
Charles
Ok Sir,
Thank You.
Great stuff. I’m glad I found this. It has been very helpful. I wanted to ask you about an application of the Chi-square. I was looking at problem, kind of like the time a train leaves a station. I took your Chi-square example, and created a mirror image (thin tail on the left, peak on the right). My “model” assumes some early arrivals, and the bulk of the arrivals are close to departure time; then no arrivals at a specific departure time. I was curious if you thought this seems like a reasonable example of use of the Chi-square
Bruce,
I don’t quite understand the scenario you are describing. Also what is the hypothesis that you are trying to test (if any)? I typically used queueing models for these types of problems assuming exponential inter-arrival times. I have also built simulation models that could be used.
Charles
Hi, your site is a good resource. Excel functions will output additional regression information, such as R2 and Adj-R2. Do you know of a way, such as a function or macro, that will output Predicted-R2 to evaluate the selected linear model?
What is the Predicted R2?
Charles
May also be known as leave one out cross validation.
It removes a data point from the dataset.
Calculates the regression equation.
Evaluates how well the model predicts the missing observation.
And, repeats this for all data points in the dataset.
the predicted R squared is calculated by using the predicted residual sums of squares (PRESS):
predictive R-squared = [1 – (PRESS / sums of squares total)] * 100
Hi,
Thanks for your comment. I will add this to the next release of the Real Statistics Resource Pack.
Charles
Hello, Charles
Did you ever get around to adding “predicted R^2” to Real Statistics software?
Thank you for the clarification,
Bill
Hello Bill
Yes, this capability was added to the Real Statistics software in May 2019. See
https://real-statistics.com/multiple-regression/cross-validation/
Charles
if i want to take a 95% binomial distribution chart based on a count of 100 giving each number 0-100 a lo and high range to match against a 200 count chart how would it be done?
Hello Belinda,
I am sorry but I don’t know what you mean by “a lo and high range to match against a 200 count chart”.
Charles
Dear Charles,
I was trying to install your add-in, but every time I try to activate it error pops out saying, that existing file is opened somewhere else or the way to it does not exist. At first I made a mistake to click on the file, but I followed instructions to reinstall it, however the error still keeps on reappearing. Also solver is installed and I did try all the other trubleshooting advices with no improvement.
Could you please help me to solve this?
Thank you.
I suggest that you rename the file containing the Real Statistics software and then install the add-in with that name.
Charles
Hi Charles,
Regarding the Nemenyi Test (https://real-statistics.com/one-way-analysis-of-variance-anova/kruskal-wallis-test/nemenyi-test-after-kw/), when I input my data, I obtain two tables, the “Nemenyi Test” table, and the “Q Test” table. The “Nemenyi Test” table I obtain includes a column title “c,” which is blank and has a gray fill. Am I supposed to enter a value in here? The figure included on the webpage for this test does not include this “c” column. My table is also missing the “R mean” column.
Also, the “Q Test” table that I obtain does not have columns for “group 1” or “group 2,” and instead it provides a single p-value, even though I have four groups. Any clue why this test seems not to work for me?
Hello Marc,
If you see a column in gray, then you need to enter the contrast coefficients, i.e. positive and negative values that add up to one.
You are probably using an older version of this data analysis tool, since in the new version you no longer need to enter contrast coefficients. When you enter the formula =VER() what do you see? If it is not 6.0 or 5.10 then you are using an older version. I will be issuing a new version of the software by tomorrow, although it won’t change this data analysis tool.
Charles
Hello Sir,
Thank you for your data.
In Mann Kendall test, i am not getting how you have calculated the prel, ties and freq in figure-1.
This is described after Figure 3 on the following webpage:
Mann-Kendall Test
Charles
Charles, Nice to meet you.
I appreciate your site fully.
My question is that your ‘real statistics’ can handle large number of columns above 250. As you know, the late Excel version can handle columns above 20000. Is your real statistics updated according to the late excel version because I found pop up denoting error when trying to handle large number of columns using real statistics.
Real Statistics supports the latest version of Excel. If you send me an Excel file with your data and what you are trying to accomplish, I will check to see whether there is some restriction.
Charles
Hello Dr. Zaointz,
First of all thanks for your website. Your work is really helpful, especially in the case of my internship.
I have a question regarding the Mann-Kendall Test and more specifically on the Example you’re providing.
I tried to redo the Example on Excel to understand it, but I have some doubt about the calculation of what appears as P19. In the formula of the variance, it appears as the sum of ft(ft – 1)(2ft + 5). But on the Example, the value of the frequencies are calculated using the following formula: ft(ft +1)(2ft + 7).
So my question was: is there a specific reason behind this change in the formula and are both these formulas equivalent or not ?
Best regards,
Dimitri
Hello Dimitri,
The formulas are equivalent. The values in row 18 are one less than the ft values. Let’s call then values gt, and so ft = gt+1. Thus, ft(ft – 1)(2ft + 5) is equivalent to (gt+1)(gt+1–1)(2(gt+1)+5), which equal to gt(gt+1)(2gt+7).
Charles
Sir I accidentally disable the macros. How can I go back. Thank you
Hello Maribel,
It depends on what you mean by “disable the macros”. If you disabled the macros on a single Excel document, I believe that when you open this document (or it is opened for you) you receive a message that says the macros are disabled. You then see some sort of prompt (probably Enable Content) that lets you enable the macros.
If you instead made some settings using the Trust Center, you can simply change these settings.
Charles
Good evening Dr. Zaiontz,
I just had this strange prompt come up when I was trying to use you program in Excel 2016. Here is the prompt:
“Compile error in hidden module: Chisquare. This error commonly occurs when code is incompatible with the version, platform or architecture of this application”
Do you have any idea what my cause this to occur. Any suggestions would be greatly appreciated.
Respectfully,
Paul
Hi Paul,
The usual reason for this message is that Solver has not been installed. If you are using Windows, press Alt-TI and check whether Solver is on the list of addins with a check mark next to it.
Charles
Hi Charles, I am having similar problem with ChiSquare. I cannot enable Solver because it is erroring out. What is the first step to resolving, reinstall solver?
Hi Dan,
I suggest that you first diaable Real Statistics (by pressing Alt-TI and unchecking RealStats). Then enable Solver by pressing Alt-TI and checking Solver. See also
https://support.office.com/en-us/article/load-the-solver-add-in-in-excel-612926fc-d53b-46b4-872c-e24772f078ca
Once you get Solver to work (which is a pure Excel issue), you can press Alt-TI and check RealStats.
Charles
Hi Dr. Zaiontz,
I just wanted to let you know that I took your suggestions for re-establishing contact with the Real Statistics program. Everything is back online and the formulas are working like normal.
Thank you for helping me troubleshoot this issue.
-Paul
Hi Paul,
Thanks for letting me know. This is great to hear.
Charles
Dear Charles,
First,
Thanks for this great website.
Second,
I did the Cochran-Mantel-Haenszel (CMH) test following your instructions, but strangely I get P value = zero and I couldn’t understand and interpret the result. So Could please you help me with this,
Rasha
Hello Rasha,
Glad that you like the website.
A p-value = 0 means that the value is very small, and so it indicates that you have a significant result, which in turn means that the null hypothesis is rejected with an extremely high level of confidence.
Charles
Hello Dr. Zaiontz,
I am compiling a report for work and I wanted to reference the Statistics package you developed for Excel. Is there any particular citation you prefer people to use? Any suggestions would be greatly appreciated.
-Paul
Hello Paul,
Thank you for referencing the Real Statistics software package. The recommended citation can be found at
Citation
Charles
Thank you for the citation information, Dr. Zaiontz. I copied and pasted the quote into my document. I can’t tell you enough how much your program has helped me with my analysis. I have also told my co-workers about this program.
Keep up the great work!
Thank you very much Paul. I am very pleased that you are getting help from the Real Statistics program.
Charles
Hi, I have 4 raters in addition to myself. They will be coding video in which we need to categorize animal behavior. I will be the “key”, but moving forward, these other raters will be coding the videos, and I want to be sure they know what they’re doing. I want to provide them with some examples to test this, but I don’t know how many examples to provide. Also, we have about 30+ possible categories and lots of hours of video. How many trials should we include in the reliability test itself? Does Cohen’s kappa seem suitable for 4 raters?
Alexandra,
Cohen’s kappa can’t be used with more than two raters.
You need to choose a different test: ICC, Gwet’s AC2 and Krippendorff’s alpha are possible choices. With 30+ categories, you will not quite a few trials, but the number depends on what your goal is — although presumably you are looking for a sufficiently tight confidence interval for the reliability measurement.
Charles
Thank you! So, Cohen’s kappa wouldn’t even work if I alone were code a set of videos and then have each subsequent rater score videos at the times I specify and each compare against me using Cohen’s kappa scores independently? If true, I will look into Krippendorff’s – good to know! And it sounds like the more trials the better, but how would I choose how many? And would I have to have an example of each behavior? Some are very rare and we don’t even have recordings of yet… Thank you again!
Hello Alexandra,
You could use Cohen’s kappa if there are only two raters and the ratings are categorical (i.e. there is no order to the ratings).
Almost always, more trials is better, but since there is a cost for having more trials you usually need to compromise. In general, you need to set some criteria — often it is a measure of the precision of the result: e.g. I want to be 95% confident that the actual reliability measurement is within .1 of the value I obtain from the sample (i.e. the trials I actually conducted). You need to determine what criteria to use, otherwise, all I can say is the more trials the better.
If you have some behaviors that are rare you can lump them all together into one “Rare” or “other” category. If even this grouped category is very rare, then I would just leave these cases out of the analysis.
Charles
Dear Charles,
I need your advice on picking which method I should use to analyze interview data results for 6 respondents. The questions are listed below. I submitted an article to a journal and it was suggested that I re-analyze my interview data to include the inter-rater reliability (IRR) results. I am not familiar with this but as I am reading up about it. I am not sure if that method fits my data results. The reason why I say this is because some of the questions do not have just a Yes or No answer. The students elaborated in their responses and did provide very helpful feedback to the study.
However, the responses vary and by me creating themes (codes) that characterize the responses, a student’s response may be attached to more than one theme for a particular question. I think that is a “no no” for using this IRR method. I hope I am making sense ?? I have another research partner to evaluate the data with me so having 2 raters wouldn’t be a problem. I just don’t know how to implement this method (IRR).
On another note, I came across a study that used Taylor-Powell & Renner 2003 approach to analyzing interview results at http://learningstore.uwex.edu/Assets/pdfs/G3658-12.pdf
However, that method doesn’t state anything about how reliable it is statistically like the IRR method.
I need your advise as what to do ? Thanks in advance.
1) As a student, how does your experience of online educational programs compare with traditional in-class instruction? How do you like or dislike it?
2) In what ways do online education programs serve your educational needs?
3) Do you think your learning outcomes can be achieved through online education?
4) In your online class, how do you feel about the communication between yourself and the instructor? Also between you and other students in the class?
5) As a student, how do you view the feedback from the instructor? Is it in a timely manner? Constructive? Provide examples.
6) How do you resolve your technical issues? Was the university technical support helpful? Who helped you mostly to resolve technical issues?
7) How did you view the online environment? For example: the navigation of the course, layout, graphics, user friendliness..etc.
8) How does the amount of coursework in your online education program comparie with traditional in class instruction?
9) List any difficulties you experienced with the course interface (navigation of the course):
10) List any difficulties you had learning concepts that were presented to you in the online settings:
11) List any difficulties you had completing the discussions questions, individual assignment and exams.
12) List any other difficulties you had with the online course.
13) Feel free to add additional comments including recommendations for the course.
Hello Virginia,
You can perform reliability analysis even if the answers are not yes/no. See the following webpage for more information>
Reliability
I am not familiar with the Taylor-Powell & Renner 2003 approach and so can[t comment about it.
Charles
Hello everyone!
I have a distribution that is not normal of my data. I would like to make a comparison between the populations first, then after studying the effect of locality and sex and the interaction between these two factors. I know that I have to use a non-parametric test since the distribution is not normal !! But what is the test that will allow me to study the effect of each factor and the interaction of the two factors!
I need help with my thesis !!
thank you so much
Hello Soumia,
If the data were normally distributed would you have used Two Factor ANOVA? This test is pretty robust to violations of normality and so you might still be able to use it provided the data is not too non-normality distributed. A nonparametric approach is the following test
Scheirer-Ray-Hare Test.
You can also try to use Resampling. This is described on the Real Statistics website but only for the One Factor ANOVA case.
Charles
Hello Charles,
Thank you so much for your reply!! I already checked the normality of my distribution and I found that mt data were not normally distributed!! In this case I can not use a paramatric test: the factors ANOVA and I have to use a non parametric test which is in my case the Scheirer-Ray-Hare Test as you said in you reply.
I think this is the only solution to study the effect of locality and sex and the interaction between these two factors ( equivalent of the two-factor ANOVA if my data were normally)!!
The problem is that I don’t know how to perform this test!!
I found in some scientific articles that they use this test when they want to study the effect of two factors and the intercation when their data are not normally distributed!!
Bes regards!!
Soumia
Hi Charles. I’m finding different results when I run eVectors on my PC which has 64-bit Excel and my colleague’s PC which used 32-bit Excel. As far as we can tell, we’ve installed the relevant version of the add-in on the two computers. The input data and formula are the same but the output is quite different.
Do I need to use a different formula in 32 vs 64-bit Excel, or might I have done something else wrong?
Many thanks
Stuart
Stuart,
I don’t know of any reason why the output from a 32-bit Excel would be different from a 64-bit Excel.
Charles
Hi Charles:
I have some questions regarding Real Statistics output from follow up tests to ANOVA (Dunnett’s, Contrasts):
1. When I run Dunnett’s, I notice in the T-Test table below the Dunnett’s table, some p-values for some contrasts that are reported as “0” and others are “1”. This was also the case in your web-page example of the Dunnett’s output. I don’t ordinarily get these p-values with other programs. Are they “approximate?”
2. Also, what is the difference in the “Contrast” option vs. the Dunnett’s option. In the older version of Real Statistics, I used to always have to define the contrasts…even in the Dunnett’s table, but now that table is blank and a T-Test Table is below. I get different p-values with the “Contrasts” table for the same contrasts compared to the Dunnett’s output.
3. Is the Dunnett’s test/results the same as I would get in SAS? Meaning, it will return the same significance at a given alpha as SAS would?
Thanks!
Keith
Keith,
1. The p-values are calculated by the DPROB function, which is defined as
DPROB(x,k,df,iter,interp) = estimated p-value for the Dunnett’s test at x for the distribution with k groups, degrees of freedom df,iter = # of iterations in estimating the p-value from the Dunnett’s table of critical values (default = 40) and interp = TRUE (default) for recommended interpolation and FALSE (linear interpolation).
Note that if the p-value is less than .01, DPROB is rounded down to 0, while if the p-value is greater than .1 it is rounded up to 1.
2. In earlier versions of Real Statistics, only one pairwise comparison was displayed for post-hoc tests such as Tukey’s HSD, Dunnett’s, Games-Howell, etc. For each comparison, you needed to manually select the appropriate constrast coefficients. Recently, Real Statistics simplified this process by diaplaying all pairwise comparisons for these tests (including Dunnett’s). Since the Contrasts post-hoc test is not restricted to pairwise tests, the old approach is used for this post-hoc test. You won’t get the same p-values for the Contrasts and Dunnett’s tests.
3. I have not checked the Real Statistics results against those from SAS. I expect that they should match, although, as stated in point 1 above, p-values greater than .1 are displayed as 1 and p-values less than .01 are displayed as 0. This is because this is the level of granularity in the table of critical values that are used by Real Statistics.
Charles
Hello
How can i analysis ” Blocking in a factorial design”? where is it in the tool?
Mery,
See Randomized Complete Block Design
Charles
thanks
but what if i have a replications
Mary,
What are you referring to? ANOVA with replications?
Charles
Three-factor analysis… A, B, C
each factor has different levels… a, b, c
and each level has n replicates
How can I analysis these types of problems?
Mary,
See https://real-statistics.com/two-way-anova/anova-more-than-two-factors/
Charles
Dear Charles,
Thank you very much for your software. i have a problem conducting the analysis though. i am using mac 2011. when i perform the analysis, a message comes out saying ” compile error in hidden module: address”. it tried to do other analysis, the same message pops up. i cant open the help option also. please help.
Jeannie,
When you select AddIns from the Tools menu, do you see Solver on the list of addins with a check mark next to it?
Charles
Dear Charles,
Thank you very much for this software you made available for everyone in need. I am a dummy in computer and stat, but exploring the tool, it makes me a little bit confident with my stat. however, the first time i tried using the real stat, and plugged in my data for mediation analysis, i was stumped by ” compile error in hidden module” message. Making me unable to proceed and force quit the program. when i reopened it, i encountered the same problem. Please help…thank you very much…
Dear Charles,
Thank you very much for Real Stat, it looks like it could help me a lot in my research. However, when i statrted to use it for mediation analysis and tried to fill in the input range, a message pops out that says “compile error in hidden module” and there is a button that says ok…when i click it, the message does not disappear, until i have to force quit the excel program…please help…i did other analysis, the same message comes up…i cannot continue anymore… SOS
Jeannie
Hi
I’ve been requested to analyse the data (set the hypothesis and test them) of a project i must choose. could you recommend me a topic with available data set , i prefer a scientific one??
thank you in advance
Mariam,
I am sorry but I don’t have enough information about the courses you are taking or areas of interest to be able to make a suitable recommendation.
Charles
Hi, Charles,
I have sent to your email a data file that represents monthly auto sales. I thought I would describe the problem here too in case anyone else has a similar problem. I’m using version 5.9 on a PC.
When I try to fit an ARIMA(0,1,1) [that is, an ARIMA with 1 difference and 1 MA term] to this data set, I get a VALUE error that I have traced to the “ARIMA_COEFF” function. I have included the output in another tab in the worksheet.
I’ve tried fitting ARIMA(0,1,1) models to some other data sets and sometimes it works and sometimes I get the same VALUE error. It doesn’t seem to be related to the size of the data set or the magnitude of the numbers. I’ve also tried fitting the same model in R and the Gretl software and it works in both of those programs, so I’m not sure what the issue could be.
Thanks for any help you can provide. I really enjoy using this software!
Dear Dr. Zaiontz,
Thank you for providing such a useful tool. I am just learning to use it with the Mac version of Excel. However, when I bring out the main menu with Ctrl-M, the font is so tiny and almost unreadable. As the screenshot below.
https://www.dropbox.com/s/w2gv4pgizrnx6it/Screenshot%202018-11-08%2020.51.32.png?dl=0
Is there anyway to fix this?
Thank you very much.
I also find it to be too small, but it would take a lot of work to make each dialog box bigger, and so far I haven’t tried to tackle this. I do plan to make some improvements in the next release though.
Charles
Hi Charles,
I wanna say thank you so much for making such a wonderful website and add-in!
You have saved me during the first week of my new job!
Please keep this website and the add-in by all means! 🙂
Hi Charles,
Thank you for your excelent tool. It is so much fun to use it when exploring statistics (just for hobby).
Is there a possibility to also implement Cliff’s delta together with CI’s?
Erik,
I will add this to the list of potential enhancements. I will consider adding this to one of the next releases.
Charles
Thanks!
Hi, I’m wondering if you or any of your readers can help?
I have a spreadsheet to monitor allocation of staff overtime. It is set up with various details in columns, with column A being a running total of the occasions an individual is allocated overtime. On each occasion their column A total increases by 1 (generated by a formula)
I am using the following code to then automatically sort the columns so that the person with the most overtime goes to the bottom of the list, thereby giving a clear indication on whose turn it is…
Private Sub Worksheet_Change(ByVal Target As Range)
On Error Resume Next
Range(“A1”).Sort Key1:=Range(“A2”), _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom
End Sub
…this works well up to a point. If another employee is given overtime, and their total increases by 1 in column A, and is then the same total as the person at the bottom of the list, XL sort doesn’t put this person to the absolute bottom in the sort. It moves them down, but above anyone with the same total. How can I tweak the above to ensure the sort also sorts duplicates to the bottom?
Thanks in advance
Mark,
Sorry, but I don’t really understand the scenario. Please explain what is contained in the Target range and what is contained in column A and any other relevant columns.
Charles
Hi Charles;
To analyze multiple mediator and moderator variables, use of Hayes process macro. These macros are run on Spss. Is it possible in excel to do this kind of analysis?
[Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67(3), 451-470]
Alireza,
Real Statistics has not yet covered the topic of Mediation Analysis. The components described in the paper that you reference are covered on the Real Statistics website and are included in the Real Statistics Resource Pack, but they have not yet been integrated as in SPSS.
This is an enhancement that I expect to add in the future.
Charles
Hi,
My students use your Add-In for some lab sessions on research design and data analysis that I teach. It is often a struggle initally getting users up and running, but eventually they do. I noticed that you updated the ANOVA repeated measures function in the PC version. Will you be updating the MAC version soon? Furthermore, I had a student with a MAC who accidentally downloaded the new PC version of the add-in and it actually works on her MAC. Have you somehow made the add-in universal?
Susan,
I am pleased to learn that you are using the addin for your lab sessions. More and more teachers are using Real Statistics in their classes.
What are the typical problems that the students are having getting up and running? Any suggestions for how to improve things?
I plan to provide a Mac version of the latest release, Rel 5.8. I hope to have it available by next week.
The main difference between the Mac and Windows version is the way ranges are input by the user. The Mac doesn’t support something called RefEdit, and so I have to use a substitute that does work on the Mac. For this reason it is surprising that one of your students was able to use the PC version on the Mac. Perhaps he/she manually inputs the range addresses. Even so, I have not been able to use the PC version on my Mac (using Excel 2016).
Charles
Hi Charles,
Thanks so much for creating this excellent tool. I’ve used it for about 5 semesters now. It can be a bit finicky, especially the MAC version, but most students eventually get it working.
Start up issues:
1. the biggest problem is students who do not have Data Analysis Toolpak or Solver installed in their Excel (this causes variable problems). This seems to be a problem with older macs.
2. Making sure they don’t download the .xlam more than one time (this seems to screw up Mac version), or not putting it in the C:\Users\xxx\AppData\Roaming\Microsoft\AddIns location on their PC (and then later moving it).
3. Forgetting to actually quit and restart Excel after they click the Data Analysis Toolpak and Solver before then clicking on the RealStats AddIn
4. When CTRL-M fails to bring up the RealStats window, which is caused by opening an excel file that had used Real-Stats on another computer. They quickly figure out to click off and then click on their RealStats in Add-Ins, but this can be annoying.
Regarding the new PC version working on a Mac -frankly it was a complete shock to me when two students, both with MACs, showed me that they had their RealStats working, but one had output that looked correct for mac and the other looked like the new PC version (we were doing repeated measures ANOVA). I checked their Add-Ins and sure enough one had the mac version and the other had the new PC version. The student then recalled that she just hit the first download button she saw and didn’t go to the mac version download.
Your update for the repeated measures ANOVA (not having to select a subjects column along with the data columns, automatic Tukey HSD output) is nice, I just need different instructions for Mac and PC now 🙂 I just didn’t catch the update before I posted the instructions this semester.
That’s all I can think of for now. Thanks again!
Charles,
Is it possible to use the Real Statistics EM algorithm functions to derive the parameters of two distributions from a bimodal distribution?
Roy,
The approaches described on the following webpage are probably useful for a specific bimodal distribution, but the functions are oriented to the multivariate normal distribution.
https://real-statistics.com/handling-missing-data/em-algorithm/
Charles
I just downloaded the package for the first time
I would like to generate a series of N random variables via Monte Carlo simulation…based on a multivariate normal assumption. I would be expecting to provide the means, variances and correlations for each variable as inputs
How would I do that?
Thank you very much
Andrew,
See the following webpage:
https://real-statistics.com/multivariate-statistics/multivariate-normal-distribution/random-multivariate-normal-vectors/
Charles
I have been working through your example on how to plot the 95 confidence ellipse and i cannot quite complete because I don’t understand how you have derived the contents of J11:K13. It is an array labeled Q, while i understand that it is simply a part of the formula for generating the ellipse data I would like to know the origin of these values and how to generate them. I am also missing what the items in G5:E5 are on the example.
Thank you for your work on this fabulous page! this particular example is exactly what i need to be able to work on my project.
Lesley,
Glad that the webpage is helpful to in your project.
1. J11 and K12 contain the cosine of the angle theta (in cell K6), J12 contains the sine of theta and K11 contains the negative of the sine of theta. If I remember correctly, Q is used to specify the rotation of the ellipse from the vertical/horizontal.
2. I assume that range G5:E5 is actually range G4:H5. This range contains the Real Statistics array formula =eVALUES(D4:E5). The first row of the output contains the eigenvalues of the covariance matrix in D4:E5. The second row contains values close to zero to confirm that the eigenvalues are correct.
Charles
Could you help me?
How do path analysis in excel?
Thank you for the very useful site.
Hasan,
I have not yet implemented Path Analysis in Real Statistics.
Here is an article that explains how to perform path analysis in Excel:
https://www.researchgate.net/publication/270887299_Path_Analysis_Step_by_Step_Using_Excel
Charles
Hello Dr,
Is there any specific quote to cite your add-on in a scientific publication ?
I used it and you deserve acknowledgment.
Mathieu,
See Citation
Charles
I am working on a text book for estuarine science and would like to acknowledge your site-what is the best way to do that? Do you allow use of your graphs or other type information to be included in a book? thank you
Bridget,
Thank you for acknowledging the website. See the following webpage for a suggested approach for citing the website.
Citation
You can use any graphs of output from the Real Statistics software that you generate yourself. You need to get my permission on a case by case basis for using any of the images that you find on the Real Statistics website of examples worksbooks.
Charles
Dear Charles,
looking at “Weibull with Censored Data”, it looks like cell D21 and D22 from your example have change compare to the previous version (Newton’s method), but I can’t figure out what has been modify.
Can you help me elucidate this mystery?
thx,
J
Jean,
Cell D21 contains the formula =1/D15+$D$10/$D$9-(D17+D19)/D20
Cell D22 contains the formula =-1/D15^2+((D19+D17)/D20)^2-(D18+D19*D11)/D20
Charles
Dear Charles,
do you plan to add also PLS to the multivariate data analysis in some future release?
Thanks once again for this great tool!
Kind regards,
Christian
Christian,
Yes, it is already on my list of potential future enhancements, but I don’t have an implementation date for it.
Charles
Dear Charles,
I would like to make a suggestion: the PCA algorithm (and possibly also other algorithms from multivariate data analysis – including PLS to be implemented) – works with previously standardized data. This makes, of course, sens, but in some instances it might be preferable to do have the results (e.g. the principal components) for the unstandardized data. Hence, it would be nice for the tools where data standardization is done automatically, to have a checkbox which permits turning standardization off, if wanted.
Thanks a lot,
Christian
Hello, thank you very much for sharing these things.
I have a question, please: to your understanding, does it even make sense to compute some Cronbach alpha for 5 ‘individuals’? I sistematically get negative results, and some of them pretty bad (around -20, for example), and I have some difficulties in interpreting these issues. Thank you very much, best regards,
Ioana
Ioana,
Yes, it can make some sense to calculate Cronbach’s alpha with only 5 subjects, especially when the results are more extreme; i.e. near 0 or near 1). While Cronbach’s alpha can be negative a result of -20 is very unusual, and is likely to mean that something is wrong with your data or the way you formatted the data.
Charle
Hi,
I ran an experiment to test for vigor in 2 plant varieties, one being B.napus and the other being B.Carinata . They’re from the same species but are have different traits. Within each variety, they are 3 and 9 different lines respectively. Each line had 3 replicates, and within each replicate there were 10 duplicates. The variables measured to test for vigor were plant height and leaf size.
What statistical tests should I run if I want to:
a) Compare the overall performance of one plant variety against the other
b) Within the varieties, find the best performing lines
c)After finding the best performing lines within each variety, see if there’s a significant difference in their vigor.
Benno,
How do you measure performance? Some combination of plant height and leaf size?
If you don’t use a combination of the two measurements, you probably use MANOVA.
Charles
Ok
Thank you Charless
How shall I get, data set that contain one dependent variable (outcome), three or more independent variables (continuous), add categorical fifth variable, sex to the data. from internet sources with complete reference to their origins, since i want to do correlation and Regression.
If u hv such data please let u help me.
Getachew,
Sorry, but I don’t have such a data set, nor know where to find it.
Charles
I have a data set that might work for your needs. You can email me at kdeitmeyer [at] atsu. edu
oops, spelling error in the above. it should be kdeitemeyer [at] atsu. edu
Dear Charles,
I’m trying to use your function WEIBULL_INV in my VBA code, however I’m facing a erro named Argument type mismatch. Can you tell me what type of variable you declared the arguments of the function so I can use the same type to eliminate the erro?
Best regards,
Leandro Dutra.
Leandro.
Function WEIBULL_INV(p As Double, b As Double, a As Double) As Double
‘ return the inverse of Weibull distribution at p; b is beta and a is alpha.
Charles
Good evening Dr. Zaiontz,
I just recently downloaded your Real Statistics package for Excel 2010 in order to perform statistical calculations for my research. I am impressed with the program’s ease of use and its accuracy. This has certainly made my work easier.
I did have a question regarding tests for data sets that are NOT normally distributed and express heterogeneity in their variance. For my particular research, I compared the lengths of Red Drum captured by 4 different trammel nets (1 control, 3 experimental). I have run the Shapiro-Wilk test on both the raw data and natural log transformed data and the program has given very low p-values for each treatment category (i.e. net type). Also, the Levene’s test for this data also indicated low p-values. So, from what I can interpret from your website, this set of data does not follow the guidelines for a one-way ANOVA test.
Moving forward, I have considered using either the Kruskal-Wallis Test or Welch’s ANOVA to compare the means of the different net types. My questions to you are the following:
1. Which of these tests has greater statistical power?
2. In regards to using the Kruskal-Wallis test, do I need to manually convert my raw data (untransformed or log transformed) into a Ranked data set or does your statistical program perform this automatically?
3. In regards to my null hypothesis (There is no significant difference in size of captured Red Drum between gear types), how would I state my results if any significance is found (e.g. “Fish captured in Gear #1 were significantly larger than those captured in other gear types)?
Any suggestions and guidance are greatly appreciated.
Thank you for your time and consideration.
-Paul
Paul,
1. Welch’s is better when homogeneity of variances is violated. It still requires normality, but these sorts of tests are reasonably robust to modest violations of normality
2. The software will convert the raw data to ranks when using the KW test
3. Yes, this sort of phrasing seems correct.
Charles
Thank you so much for assistance, Dr. Zaiontz!
I admit that I am not as well versed in statistics as I would like to be. However, with your assistance and website, I feel that I can make the correct assumptions from the program’s data output.
Thank you again for your time and assistance.
Respectfully
-Paul McLaughlin
Hello, I downloaded your package, but I was a bit in a hurry so I double-clicked the RealStats-2007.xlam file in the final destination folder before making Excel preps to use it! Shame on me… The Add-in worked fine after proper setup in Excel 2007 (thank you for a superb package), but when I try to close Excel I’m prompted for a password. How do I come out of this loop? Thanks in advance.
I have not seen this error before. You should not need a password ever. When the software is not installed correctly, you may be prompted for a password, but I have not heard anyone getting prompted for a password when closing Excel.
The only thing I can think of is to uninstall RealStats-2007 and start all over again with the installation.
Charles
I tried that – didn’t work. I’ve learned that the error is linked to Solver, but Microsoft doesn’t provide support for Excel 2007 anymore. Partial solution: speed-clicking “Esc”, and the password prompt disappears until next time RealStats have been accessed and Excel closed (tip from the developers of Excel standard version of Solver: Frontline Systems). Hence, I have to live with it.
Can you give me the details about STAR (Spatio-temporal
Autoregressive) model in excel?
UNI,
The Real Statistics website hasn’t addressed this topic yet. I will add it to the list of potential future topics.
Charles
Dear Dr. Zaiontz,
Thank you very much for your detailed and easy-to-follow website. It was a lifesaver as I was writing my master’s thesis and I have it bookmarked for future applications!
Kristie,
Good to hear. Glad I could help.
Charles
Dear mr. Zaiontz, thank you for your very useful application!
This not only helped in the work, but also helped to understand some methods as well as the sequence of computations.
One useful tool can really beautify your app. I really want to have the ability to calculate incidence rate ratio CI and p-value with different ways like it is implemented in http://web1.sph.emory.edu/cdckms/IRR-single%20table.htm
or
http://www.openepi.com/PersonTime2/PersonTime2.htm
I use these online pages, however, is a very long time, if computing a lot. It would be great to be able to do this in Excel.
P.S. I couldn’t donate because there is no Russia in the country list in PayPal
Dr. Zaiontz,
I would like to thank you for making all useful information accessible to the students.
Do you have any materials or example on K- fold cross validation method ?
I am trying to write a VBA code to perform multiple regression using K fold cross validation method.
Thank you for your help!
Fady,
See the Cross Validation part of the following webpage:
Estimating Lambda for Ridge Regression
Charles
The ridge regression function are really nice. Are there going to be similar functions for Lasso regression to figure out the Lambda? For example is there going to be a LassoLambda() function in the future version?
Hussain,
Thanks for your input
This is on the list of potential future enhancements
Charles