Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.
In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.
Charles Zaiontz
My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here
Don’t see the Panel Data option under Time Series, yet I have the latest version. Is there another add-in that I need?
Thanks! Thomas
What do you see when you enter the formula =VER() in any cell?
Charles
Dear Dr. Zaiontz
I am a statistical beginner and have a task in our company.
We have a simple dataset, for example:
114 Instructors are rating a specific performance of a student. They can rate that performance from 1 (failed) to 5 (exemplary). If that specific performance cannot be rated, the give a NO (not observed). This NO could also be interpreted as a 0. I have to find a way to give these ratings a score of agreement or reliability. The wish would be to compare the score to a benchmark like Landis Koch.
The most critical part is the weighting. The ratings are not equally „spaced“. For example If one rates the performance a 1 and another a 2 thats critical, because 2 means failed and 2 means passed. Wereas the difference between 3 and 4 or 4 and 5 is almost neglictible.
Is there any function, kappa or something to achieve this?
Thank you so much
Best regards
Mike
Hi Mike,
There are many approaches to measuring agreement with unequally spaced ratings.
The simplest approach for 5 rating categories is to use say 7 rating categories, 1 through 7. Then you could use 1 for fail, 4 for pass (skipping 2 and 3) and then using 5, 6, and 7 (with 7 representing exemplary).
In any case, Gwet’s AC2 measurement provides lots of flexibility that should accommodate your needs. See
https://real-statistics.com/reliability/interrater-reliability/gwets-ac2/
Charles
Hi Charles
Thank you for your answer. I‘m trying to figure out how to implement the AC2 on my dataset. The thing is that our instructors are just rating one item from 0 to 5. They are not rating multiple items. Is it still possible to calculate a coefficient with this simple dataset? So as an example. The first judge is rating the item a 3, judge 2 gives a 4 and so on. But the ratings are heaviliy weighted.
Thanks and best regards
Mike
Mike,
How many judges are there? What do you mean by ratings are heavily weighted?
Charles
Hi Charles,
there are around 55 judges.
I have for example a dataset for a specific situation which looks like this:
Grading 1: 10 judges
Grading 2: 20 judges
Grading 3: 8 judges
Grading 4: 7 judges
Grading 5: 0 judges
Not observed: 10 judges
Not observed is a separate category, when the judges decide they cannot give a grading to the specific situation. The not observed has also to be taken in the score calculation.
In the example we see 10 judges gave a 1 and 20 a 2. this is crucial, because 1 means fail for the student and 2 means pass. But grading is just 1 number spaced. If the judges would have given 4 and 5s. For example 10 judges gave a 4 and 20 judges a 5. The impact is much more less worse. Because 4 and 5 are good gradings. But as a number they are also just 1 spaced. But in reality it has less impact when the judges don‘t agree between a 4 or 5 than when they don‘t agree on a 1 or 2.
I simply don‘t know how I can create a score with this simple dataset of max 6 numbers
Thank you so much
Mike
Hi Mike,
1. All the statistical tools, that I am familiar with, which measure agreement between judges require multiple subjects. You only have one subject. Since the judges have different ratings for the one subject, one can conclude that there is no agreement. To measure the extent of agreement or lack of agreement, you could use statistics such as the variance or similar statistics.
2. Even if you use the (weighted) variance, you still need to determine what data set you should take the variance of. Since you have said that the ratings are not equally spaced, you need to map the ratings to a spacing that better reflects what you think the correct spacing is. For example, you could use a mapping such as grades 1, 2, 3, 4, 5 maps to ratings 1, 5, 7, 8, 9, although I am sure that you have a better idea of the what these revised ratings should be. You also need a mapping for not observed. Perhaps this could be the mean of the revised ratings or you just don’t include these values in the data set.
3. These are some of my ideas, but I am not sure whether there are better approaches.
Charles
Hi Dr. Zaiontz,
I had a question regarding the columns representing t or time. In the example workbooks its incremental by 0.2. Is there a standard unit you are using, or do you enter t as you wish? If I wanted to represent 5-minute intervals, is it simple 0,5,10, or is there a time conversion to be done? Thank you for your work on real stats!
Nathanael,
There is no right answer here. You can use any units for t that make sense for you. 5 minute intervals can be codes as 0, 5, 10, 15, etc. or 0/60, 5/60, 10/60, etc. You can also use 0, .1, .2, etc. although this would net out to 6 minute intervals.
Charles
Dear Dr. Zaiontz,
Thank you for always helping us and asking for nothing in return. Is there a donation link where I can send you some coffee money?
Having RealStats, even with Excel issues, is a lifesaver for me because I can get lightning-fast calculations done and arrive at the correct answer in 10% the time it would take me to even get my data imported into R or SPSS.
Best,
Bryan
Thank you, Bryan. Glad I could help.
The Donation page is at https://real-statistics.com/donation/
Charles
Hi Charles,
Thanks for the great tools! I was wondering if these tools are usable in VBA. For example, looking at Correspondence Analysis, how would you call or implement in VBA the formula D_r = DIAGONAL(1/SQRT(myvector))? Or the MMult() / SVD decomp function?
Best,
Hi,
Yes, you can do this. See
https://real-statistics.com/real-statistics-environment/calling-real-statistics-functions-in-vba/
Charles
Thank you, Charles!
And am I correct in understanding that functions can accept arrays and vectors (not just ranges)? Specifically, I am not able to call the SQRT() function in VBA (I get subscript error), but it is possible in when calling it in a worksheet/cell.
Best,
Hello Pranav,
In VBA the function Sqr(x) is used instead of SQRT(x). You can use WorksheetFunction.SQRT(x). If x is replaced by an array such as A1:A5*1, then WorksheetFunction.SQRT(A1:A5*1) should work.
Charles
Thank you for this wonderful tool! Is it possible to enable multi-core calculations? when I launch it to calculate table, excel says its using X threads, but only 1 core is used from task manager.
Hello James,
Glad that you like Real Statistics.
I understand that it might be possible to create a multi-core version, but I don’t know how to do this.
Charles
Hello Charles,
first of all: my congratulations on your excellent package and for making it available to the community.
I have noticed a bug in the function “NextPrime”. Using it on e.g. 7, 23, 47, 113, 167, 283, 359, 523 gives as the result the square of 3, 5, 7, 11, 13, 17, 19, 23.
Kind regards,
Javier
Hello Javier,
Yes, you are correct. The PrimeList function seems to work correctly, but there is a logic mistake in the NextPrime function.
Thanks for identifying this error. I will flag the error on the website and correct this in the next software release.
I really appreciate your help in improving the accuracy of the Real Statistics software.
Charles
I can download the below, but can’t “enable” it (i.e. Excel just hangs up).
Click on Real Statistics Examples ANOVA 1 to download the Anova 1 examples workbook. 3/22
Can you advise? FYI, I’m running MS Office Home 2021.
Hello Dan,
I just downloaded and opened (enabled) this file successfully. It did take a little while to complete, although still less than a minute on my computer. Perhaps it is a little slow, but Excel didn’t hang up.
Charles
Hola! Estoy haciendo una curva de supervivencia de kaplan meier pero no se qué hacer para que aparezcan los datos censurados en el gráfico. Gracias
Hello Charles,
I am doing a simple linear regression for my data, x is my dose levels, while y = is the parameter I am measuring. But at the highest concentration the parameter value is 0. So should I include the highest concentration and associated value of 0 or should I exclude them and then run the analysis.
Thank you in advance .
Jay,
Your regression is of the form y = bx + c where x is the dosage.
The parameters b and c and determined by the regression algorithm.
You should supply the (x,y) values for all the data that you have, including at the highest concentration level.
Charles
Hello Charles,
Thank you for your reply.
Suppose for example,
Dose Response
2 5
20 10
200 20
2000 40
20000 100
200000 0
Wouldn’t including the last 200000 dose and its associated parameter would effect the correlation coefficient value?
Hello Jay,
If you plan to use linear regression (or calculate the correlation coefficient), then the biggest problem is that the (x,y) values don’t fit a line very well. This is true even if you leave out the last point (200000,0).
A step in the right direction is to to use the log of the doses as the x values. This results in the following points (using log base 10).
0.301029996 5
1.301029996 10
2.301029996 20
3.301029996 40
4.301029996 100
5.301029996 0
These are not quite linear, but reasonably close if you leave out the last point, but it would be a shame to leave out any of the data.
Of course, this depends on what you plan to use the regression model for. If you aren’t interested in dosages higher than 20,000 perhaps there isn’t much harm.
One approach might be to fit the data to a curve that is not a line. You could try polynomial, exponential, log-normal or other curves until you find one that best fits your data.
Charles
Hello,tanks for attention to me. I have a question about inferential statistics.
Degree of freedom(df) in the independent t formula n1+n2-2 when the confidence level is .05, we have two degrees of freedom.
For example, when DF 58 is obtained and we have DF 40 and 60 in the table.Why do we consider DF 40?
My question: Why do we round DF to a smaller number?
Thank you for taking the time to read my email. Sorry, I am not very fluent in English. I contacted you from Iran
Hello Fatima,
Which test are you referring to? Two independent sample t-test? In this case, df wouldn’t be 40.
Charles
Dear Charles,
I suggest that in the donation section, you also add the option to make a transfer, providing an account with the BIC code (Bank Identifier Code) or SWIFT code (Society for Worldwide Interbank Financial Telecommunication)
This way, people who don’t have PayPal could make a donation to thank you for your great work.
sincerely
Thank you for your suggestion, Carlos.
I will look into doing this.
Charles
Hello Charles.
I wanna make a request. If it’s not to much to ask.
Could you please add O Brien, Ramsey Conditional and BF test for homogeneity of variance in the software?
Thank you.
Hello Jay,
I will look into these and add them to my list of possible future enhancements.
Charles
Hi Jay,
BF test: this is Levene’s test using group medians instead of group means. The Real Statistics software already supports this test via the median option of Levene’s test.
Ramsey’s conditional: this is a test as to whether P(A|B) is high. I have not seen any papers as to what is considered high enough. Do you have any references that would be useful for implementing Ramsey’s test?
O’Brien: I still need to research this.
Charles
Jay,
I see you that O’Brien’s test is a medical test related to the shoulder. Is this the test you are referring to or do you have some statistical test in mind? If so, do you have a reference for this test?
Charles
Hello Charles,
I hope you are doing well.
For quite some time I have been looking into the recent papers regarding the power of different homogeneity of variance and normality tests, I have come across couple of good papers and books
1) An introduction to Statistical Concepts by Debbie L. Hahs-Vaughn and Richard G. Lomax
2) Statistical tests for homogeneity of variance for clinical trials and
recommendations (doi.org/10.1016/j.conctc.2023.101119)
3) Erjavec, N. (2011). Tests for Homogeneity of Variance. In: Lovric, M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_590
4) Ralph G. O’Brien (1979) A General ANOVA Method for Robust Tests of Additive Models for Variances, Journal of the American Statistical Association, 74:368, 877-880, DOI: 10.1080/01621459.1979.10481047, the original paper by O’Brien
5) Ramsey, P. H. (1994). Testing variances in psychological and educational research. Journal of Educational Statistics, 19, 23-42. doi:10.3102/10769986019001023
6) Wang Y, Rodríguez de Gil P, Chen YH, Kromrey JD, Kim ES, Pham T, Nguyen D, Romano JL. Comparing the Performance of Approaches for Testing the Homogeneity of Variance Assumption in One-Factor ANOVA Models. Educ Psychol Meas. 2017 Apr;77(2):305-329. doi: 10.1177/0013164416645162. Epub 2016 Apr 27. PMID: 29795915; PMCID: PMC5965542.
Hi Jay,
Thanks for providing these references. I will follow up on these tests shortly, and will add at least some of them.
Charles
Thank you Charles.
I’m eagerly looking forward to it.
Hello Jay,
I will be issuing a new release today that will contain O’Brien’s test. See
O’Brien’s Test
Charles
Thank you so much, Charles. Please accept my deepest gratitude.
I wish there was clear detailed guidelines for when to use specific homogeneity of variance test. Can you provide me with any references if you have any?
Hi Jay,
The notes about Ramsey’s Conditional Test on the following webpage
https://real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/obriens-test/
give some indication of which test to use. The last reference on the above webpage also provide some guidelines.
Charles
Hello Charles
I was going through the paper, I came across the term “cell size” – recommendation for choosing the homogeneity of variance test based on the average cell size. Does that mean an average number of observations across all groups?
Thank you in advance.
I have a similar query about the shape of the data, here skewness or kurtosis refers to the shape of an individual group’s data or pooled data.
Jay,
I don’t know what the authors were referring to here. Probably the shapes of individual groups, probably assuming that they have similar shapes, but I can’t say for sure.
Charles
Hello Jay,
I assume that you are referring to the Wang et al. paper referenced at
https://real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/obriens-test/
Yes, I believe that average cell size is equivalent to the the average number of observations across all groups, at least for one-way ANOVA. For two-way ANOVA with m row levels and n columns levels, there are mn cells.
Charles
Hello Charles, 👋
I am stuck with something. I have taken body weight of animals from 2 groups at intervals of 7 days. First group is control and second one is treatment.
1) I have individual data of their body weight
2) I have percentage change in body weight (some values are also negative)
3) Would a T test would be appropriate for testing whether there is a difference between %body weight change in control vs treatment group at day 7/14/21/28. So total of 4 T tests.
4) Do normality test are appropriate for negative data. Or do I have to transform the data?
Thank you.
Hello Jay,
1. If I understand correctly, you have between subjects factor (Treatment vs Control) and one within subjects factor (day 7, day 14, day 21, day 28).
Which test to use depends on what hypotheses you want to test. Assuming you want to perform a broad range of tests, you could use a repeated measures ANOVA with one between subjects factor and one within subjects factor.
2. If %body weight is between day 7 and day 14, as well as betweeen day 14 and day 21, as well as between day 21 and day 28, then the within subjects factor only has 3 levels instead of 4 (as described in #1 above)
3. Normality applies to data sets with negative values and so no transformation is needed.
Charles
Hello Charles, Thank you for your help.
But what about the percentage data? percent body weight change data .
Hello Jay,
The approach that I suggested should work for percentage data and percent body weight change, as long as the assumptions are met.
Do you have reason to believe that the assumptions are not met?
Charles
Hello Charles,
Please correct me if I am wrong because I was under the impression that percentage data can be analysed only with non parametric tests such as MW.
Jay,
I don’t know of any reason why you couldn’t use a t-test or ANOVA with percentage data, as long as the test assumptions are met. Now it may be more likely that the assumptions fail, in which case you can use MW, but first test the assumptions.
Charles
Hello Chalres,
What if HOV test comes significant when performing Two way ANOVA?
Hello Jay,
If the HOV test is only slightly significant, two-way ANOVA might give reasonably good results. Unfortunately, there aren’t many good nonparametric test options. Two such options are described on the Real Statistics website:
Scheirer-Ray-Hare Test
Aligned Rank Transform (ART) ANOVA
Charles
Hello Charles,
I am new to statistics.
Suppose I have two groups of data, for example, male and female.
I wanna perform T-test, so do I check for normality using different tests such as Shapiro-Wilks, Kolmogorov, or D’Agostino test for both the groups differently? Or do I need to pool both group’s data and run the tests?
-> What if one of my group fails the normality test? So If I wanna perform a parametric test, I would need to transform both of the groups. Am I doing this right?
Hi Disha
If the male and female groups are independent then you need to test each group separately for normality. If one group fails, you can use a nonparametric test such as Mann-Whitney.
If for every male, the female group consist of the man’s wife (or the man’s daughter, etc.) then you you don’t have independent groups. In this case, you need to perform a paired t-test. This requires a test for normality on the difference between the scores of each pair.
Charles
Good morning Mr. Charles Zaiontz.
In advance, I thank you very much for this contribution to be able to carry out the statistical analyzes in a simpler way. It helped me a lot to be able to do my PhD analysis in 2021.
I have a problem with the plugin. Since I installed the latest version, my excel has become slow. Even with a new and empty document, the action of copying a cell takes up to 10 seconds to complete. My PC is Core i5 8va with 12 Gb of RAM, SDD memory and Excel 365. I have worked with databases of more than 100,000 data and it has never slowed me down.
I apologize for bothering you with this, but I would like to know if you have any way to fix this. Thank you for your cooperation.
Hello José,
Yes, this is concerning. Copying a cell should be close to instantaneous.
You seem to have a pretty powerful computer. I am not seeing this problem on my computer which has a similar configuration. My computer slows down when I have multiple Excel spreadsheets open, especially when one of the examples workbooks is open.
I suggest closing all open files, and restarting your computer to see whether the problem goes away. Perhaps you have already done this. If this doesn’t solve the problem, please let me know.
Charles
Hello Charles, I wanna check whether or not treatment has different effects on sex, I have a total of four groups, with n = 4. I went with Two way ANOVA.
1)However, the assumption of equal variance is not met, can I still run the two-way ANOVA, or should I log-transform my data?
2) What if even after transforming it still doesn’t meet the requirement of equal variance?
3) which post hoc should I do in either of case.
Hello, Charles.
I’m comparing if treatment has different effect on sex, each group has 6 samples. So I went with two way ANOVA. The normality assumption is met, however, the homoscedasticity plots and Leven’s test is significant. Can I still run the Two way ANOVA, would there be more chance of getting false results. Or should I just transform my data, and go with two way ANOVA.
Which post hoc should I use for either of the cases.
Hello Jay,
1. Let me understand your design better. Are you saying that you have a Treatment group and a Control group? Are you saying that each group has 6 samples of each sex or 6 samples with a combination of male and female? (3 and 3?)
2. What hypothesis or hypotheses are you trying to test?
3. Generally ANOVA is sensitive to violations of homogeneity of variance, but there are also limited alternatives to two-way ANOVA. This is why I am ask the above questions to see whether two-way ANOVA is really needed. Also, what is the p-value of Levene’s test. Also what data did you perform Levene’s on?
Charles
Hello Charles,
I apologize for the number of comments on the same query. When I try to post a comment it says, the comment is under moderation and then it vanishes.
1) The design
-Two groups: Control and Treatment
-6 males and 6 females in each group ( 6 Males in Control, 6 Females in control) and (6 Males in treatment and 6 females in treatment) Total number of samples = 24.
– 3 Dependent Variables. ( So I was performing the Two Way ANOVA on 3 variables )
2) Hypotheses that I wanna check, (A) Whether or not there is a difference between the dependent variables in control and treatment, and (B) If the difference in variables is significant between the two sex (in both control and treatment group)
3) Results of the Levene’s Test
type p-value
means 0.007
medians 0.034
trimmed 0.007
* Levene’s Test was performed on Data of Dependent Variable 1 from all 24 samples.
Thank you in advance.
Jay,
1. Have you been trying to place a comment on the Contact Us webpage or some other (if so, which webpage)? I will try to figure out why you are getting this message.
2. Two-way ANOVA does seem to be a reasonable approach, assuming that you have one dependent variable (and two factors: Treatment/Control and Male/Female).
3. You shouldn’t perform Levene’s test on all 24 samples together. See the following webpage regarding how to test the assumptions:
https://real-statistics.com/two-way-anova/testing-two-factor-anova-assumptions/
Charles
Hello Charles,
I hope you are doing well.
1) I posted the comments here: https://real-statistics.com/contact-us/
2) Actually I have 3 independent variables. And I was doing Two way ANOVA on each independent variable separately.
3) For example, the data is reformatted as shown below. So, I am performing Leven’s test on this data, which shows significance.
Male: Control Male: Treatment Female: Control Female: Treatment
48.8 7.4 86.3 361.2
19.3 6.5 78.3 86.4
12.3 4.1 105.7 43.6
25.4 2.7 115 78.5
14.6 33.3 34 58.9
16.5 14.6 49.5 91.9
Jay,
1) No problem
2) Not sure what you mean by separate 2-way ANOVA for each independent variable since each 2-way ANOVA is based on 2 independent variables
3) The problem may be the presence of outliers. E.g. 361.2 is a potential outlier.
Charles
Hello, Charles.
I am stuck with something, I want to find out whether or not treatment has differrent effect on male and female subjects. So I went for two way ANOVA, sample size is 6. However, the Levene’s test is significant. Will my Two Way ANOVA result still considered to be valid? And what if I transform the data, and if my levene’s is not significant then; can I continue with two way ANOVA?
Whenever I start Excel, a message window always open ” This worksheet can’t close because of referring to other worksheets” with a warning sound.
please, help me.
Hello Danny,
I don’t know why you are receiving this message. I have never seen this message before.
Perhaps someone else on the website has a better answer for you.
Charles
I am comparing two groups, control vs treatment, both have the same sample size ( n =6), I wanna go for t-test to compare the means.
The first question is, do I check for normality using (SW) for both (control and treatment) individually? Second I will check for homogeneity of variance using the F test or Bartletts or Leven’s.
What if one of the groups fails the SW test? then do I have to transform the data of both the groups and check again for normality?
The second question would be, even after performing the transformation, if the data still fails the SW test, then I’d go with nonparametric tests.
So I might have 3 different conditions
1) Both groups p>0.05 for SW and >0.05 for Bartletts
2) One of the groups fails for SW.
3) Both Shapiro and Bartletts test shows significance.
Which tests should I do for all these conditions?
Kindly help me out.
I would test each group separately for normality using SW test. If this assumption is met I would use the t-test. I would use Levene’s test for homogeneity of variances. You don’t really need to check for equal variances since there is a version of the t-test that works in this case.
If the normality assumption is not met then usually I would use a non-parametric test, probably Mann-Whitney. I would only use a transformation if there is one that make sense based on the scenario.
Charles
Based on the scenario. Could you please elaborate that? Thank you.
Thank you, Sir. For this wonderful creation. This is amazing.
I was wondering if there is any option to get only the p-value result of for example T-test and its nonparametric variants?
Again thank you so much. Hoping to hear from you.
Have a good day.
Hello Jay,
Excel’s T.TEST returns only the p-value for the t-test.
Real Statistics’ MWTEST returns only the p-value for the Mann-Whitney test.
Charles
Re: the section detailing the calculations of the mean and variance of the rectified normal distribution – the final terms where the pdf is multiplied with the variance, should be the pdf multiplied with the deviation. The ‘step before’ each of these correctly show the deviation, but the last step changes this to the variance.
Hello Herman,
Thank you for catching these errors. I have now corrected these mistakes. Please let me know if I have missed something.
I really appreciate your help in improving the quality of the Real Statistics website.
Charles
Hi Charles, happy to help.
I am struggling with a specific problem – calculating the variance (or even 2nd raw or central moment) of a censored NB distribution. It’s easy enough to do recursively bottom up, but sometimes the mean is too high and this isn’t efficient or accurate.
Let’s say I have a distribution with a mean of 5 and variance of 7, which is censored at 6. All original probabilities above 6 are now added to the original probability of 6.
I can calculate the mean using the NB loss integral which is well documented, but not the variance. Obviously the truncated NB model doesn’t apply here as it simply cuts off probabilities in excess of 6 and normalises the remaining probabilities.
If you can point me in the right direction I’d be very grateful, or if you have the inclination to tackle this with a specific page detailing the calcs that would be even better.
Hoping someone can help :).
Hello Herman,
I assume that NB refers to a negative binomial distribution. I am not familiar with a censored or truncated NB distribution. Sorry to say, but I don’t know how to answer your question. Perhaps someone else reading your comment can answer.
Charles
https://real-statistics.com/multiple-regression/ridge-and-lasso-regression/ridge-regression-example/” in this link you calculated the lambda value as n-1 * 0.01. How did you make this calculation? Is this how good lambda value calculation is done?
Hello Kadir,
I can’t recall for sure, but this choice may have been arbitrary. In any case, see the following webpage regarding how to choose lambda.
https://real-statistics.com/multiple-regression/ridge-and-lasso-regression/estimating-ridge-regression-lambda/
Charles
Professor Charles,
i need to measure trends, differences between periods, and detecting point of change.
the data are weekly sales of many sellers.
i assume the sales data are not normal and so i choosed Mann Kendall test for trend, Wilcoxon Rank Sum for evaluating differences between periods, and Change Point Analysis.
a) Is this correct or i should use some other tests that you could kindly suggest me?
b) in addition i would be very facilitated if ChangePT_test() could accept row arrays, cause there are hundreds of sellers to evaluate.
i tried to make it by myself applying the method you describe in https://real-statistics.com/time-series-analysis/time-series-miscellaneous/change-point-test/ but i didn’t succeed it; there are some questions unanswered in the comment section there.
thak you in advance and many many compliments to your great work
Hello Eugenio,
a) I don’t understand well enough what you are trying to accomplish to say whether this approach is correct.
b) ChangePT_TEST accepts data in the form of a column array, but if you have a row array you can use TRANSPOSE to make it into a column array. E.g., if your data is in range A1:W1, you can use the array formula =CHANGEPT_TEST(TRANSPOSE(A1:W1),TRUE).
Charles
thanks a lot Professor for the b) answer
going back to the a) question i ll try to explain it better
the task assigned to me is to evaluate and compare sales force (several hundreds sellers) on their performance.
The data are weekly units sold, let’s say only as total products.
The periods subject of analysis are from a few n week (as little as 6) up to the weeks of an year.
The management until now asked to analyze the data on the basis of their guessworks, identifying arbitrary threshold or what-they-think are valid percent differences between periods, etc.
I would like to improve this introducing some statistical based measures in terms of:
– trend of the specific seller over the Full year or the last x weeks
– difference between one period of x weeks and one another period of y weeks
– identificate if and when discontinuity appears
So, as first thing to do, i tested all these data series for normality: most of them doesn’t reject Jarque-Bera test and data appears more or less aligned on a 45 degree line in a Q-Q Plot, but many doesn’t appear normal.
In addition, i have read somewhere the we cannot assume normality of the sales data, that sales are often assumed to be poisson distribuited (see https://stats.stackexchange.com/questions/49919/what-is-a-good-distribution-to-model-average-sales)
Furthermore i have to examine data having small numerosity (a few weeks, as i told above)
For all these reasons i think i need non parametric tests
Is this conclusion correct?
If it is, Mann Kendall, Wilcoxon Rank Sum and Change Point Analysis are a valid choice? Or you would have chosen some others?
thank you
Hello Eugenio,
Thanks for sending me the additional information, but I still don’t have anything to say (except that there are better tests for normality than Jarque-Barre).
Charles
Hi Charles,
Can you please give me the probability density function of the Beta PERT model as I dont seem to find it in your blog under the PERT Beta section.
Thanking you,
Vinodh Ramji
Hi Vinodh,
See the following webpage:
https://real-statistics.com/binomial-and-related-distributions/pert-distribution/
Charles
Hi Charles, could you give a single example of the FRECHET_INV function? Let’s say FRECHET_INV(rand(),1,2,-3) . Looks like function not yet working? Thanks
Hello Budana,
Thanks for bringing this error to my attention. I will fix the error in FRECHET_INV in the next release of the software, which should be available this week.
I appreciate your help in improving the quality of the Real Statistics software.
Charles
Hi,
I have to prepare the forecast where the number of delivered services depends on the number of customers acquired.
Additionally, there is annual and monthly (due to working days impact) seasonality.
Do you have any articles on how to prepare a forecast for such a case? How to deseason these two seasonalities and build regression on the number of customers?
Can any excel tool handle this in one go or process need to be divided to deseason, trend set-up and adding seasonality back?
What seasonality to exclude first? Annual or monthly?
Tomasz,
Regarding the issue of handling both annual and monthly seasonality, I don’t have any references on this, but I would guess that the order doesn’t matter. You can try both and see whether it makes a difference. My intuition would be to handle monthly first, but it probably doesn’t matter.
Charles
Dear Doctor, I am very grateful for the great effort you put into this add-in. Will we ever see its version for Google Sheets?
Leo,
Unfortunately, Google Sheets doesn’t support VBA, which would make support much easier. I will check into the effort to support Google Sheets.
Charles
Charles-
Thanks for all the statistics tools. Is there a statistics tool on RealStats that is similar to the Arrhenius equation to determine stability?
Ken,
No, the Real Statistics software does not have specific support for the Arrhenius equation. It is quite possible that Real Statistics tools can be used (e.g. regression), but I am not sufficiently familiar with the subject to say for sure. See
https://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Kinetics/06%3A_Modeling_Reaction_Kinetics/6.02%3A_Temperature_Dependence_of_Reaction_Rates/6.2.03%3A_The_Arrhenius_Law/6.2.3.01%3A_Arrhenius_Equation
Charles
Hi Charles,
Great add-in program. I am having a difficulty of having the Real Statistics available when I load Excel.
I have done the following the first time:
I used File > options > add-in > OK > Go > Browse and selected XRealStat.xlam file in the download folder. This Added XrealStat to the add-ins and worked fine.
Once exited Excel and open it again, RealStat is not available. When I check the add-in windows, XrealStat is checked in, but add-in menu is not available to use XrealStat. The only way I could make it available is to uncheck it first and then check it again on the add-in window. I would think there should be a way to have XrealStat to be available each time when I open Excel. Any help? Thanks.
Hello Faruk,
Usually, the program works the way that you have suggested. You don’t have to uncheck and check it in the add-in window. I don’t know why you are having this problem. What do you see when you enter the formula =VER() in any cell? After exiting Excel and opening Excel, what do you see when you press the key sequence Ctrl-Shift-m ?
Charles
Hi Charles,
Thanks you providing Real Statistics. Excellent add-in program.
I cannot seem to obtain prediction and confidence intervals for a multiple regression. I have the X values in A1:A5 and Y values in C1:C5 (including the titles). I obtain the multiple regression using Multiple Linear Regression option. How do I obtain the confidence/interval estimates? I tried different options, but could not obtain the intervals. I see that you have an example (Example 1) on
https://real-statistics.com/multiple-regression/confidence-and-prediction-intervals/
I could not follow what you did to obtain the intervals. Any help will be appreciated.
Hello Faruk,
I have just added a link to the following webpage which enables you to download the spreadsheet described on the webpage.
https://real-statistics.com/multiple-regression/confidence-and-prediction-intervals/
Hopefully, this helps you understand how to obtain the confidence interval estimates.
Charles
Hi Charles,
Many thanks for all the content on real-statistics!
I am confronted with the following problem and I thought maybe you might help.
As a clinical neuropsychologist (thus I am not a statistician nor a mathematician but I use stats in my work), I compare scores from one individual to scores from groups. I use a modified t test for that (n=1 vs. n=X). I can easily obtain the associated p value. XL does that job perfectly well for all the scores I have to compare.
Now, I would like to calculate confidence limits around that p (and also around the effect size). There is an adapted method developped by Crawford & Garthwaite when one n=1. In this method, one has to obtain both the upper and lower deltas (of course). That is given some value that I can calculate, the degree of freedom and a quantile (0,025 for the upper delta and 0,975 for the lower delta) I might find these delta values. The problem is, I have no idea how XL might do that job. To be perfectly clear I quote the authors: « We want a non-central t-distribution on X-1 d.f. that has [some value calculated from provided formulas] as its 0,025 (or 0,975, respectively) quantile. »
I asked the authors about how XL could find this noncentrality parameter but they don’t know. As I have seen information about non central t distribution on your site, I wondered if you might have a solution for me.
Many thanks for your time and the attention payed to this message.
Best regards,
Patrick
Patrick,
The Real Statistics add-in to Excel does provide support for the non-central t distribution. Perhaps this could be useful to you.
Charles
Dear Charles,
I have seen and read the section on non central t distribution in Real Statistics. That’s why I was contacting you. My problem is that I don’t see how to use the functions you developped so that I can get the deltas. I was hoping you might tell me how to proceed.
Many thanks for your time.
Patrick
Patrick,
I am not sure, but perhaps the following webpage can be helpful
https://real-statistics.com/students-t-distribution/confidence-interval-effect-size-power/
Charles
Dear Charles,
Many thanks for the webpage! It has indeed been very useful! I can now obtain the values of the parameters I needed.
Patrick
Hi Charles
Again many thanks for this excellent resource.
it seems however that Microsoft has ‘upgraded’ again.
They are now explicit that all macros are assumed evil.
A couple of months back the only way to activate real-stat was to ‘remove the mark of the web’, which involves checking a box in ‘properties’, however after a break I cranked it up again and following an upgrade a couple of days ago it now generates repeated (probably over 40) ‘Microsoft Visual Basic does not support’ errors, then politely asks if I’d like to remove the package.
I needed it to generate an ANOVA I’ve done previously but misplaced.
As I have access to a Mac, I’ve downloaded the package to that.
The Mac didn’t care about the mark of the web.
However, on the Mac the unbalanced 2-way ANOVA I need is providing negative F values for the interaction term, which then results in an error term for the associated p value.
The raw data seems valid, and beyond that I don’t know enough to know why this is happening.
As in the past you have asked to email the file, I’ve taken the liberty of doing that again, from goldfish5552000@yahoo, which often ends up in spam.
Thanks in advance for any help you might be able to offer.
jamie
Hi Jamie,
I’ll check my span to make sure that I get the file that you emailed me.
Regarding the unbalanced 2-way ANOVA, since it is unbalanced you need to select the Regression option on the dialog box for the Two Factor Anova data analysis tool.
Charles
Hello Dr Zaiontz,
Thank you for providing such a useful toolpack! However, when running monte carlo simulations using poisson_inv, it severely slows down the simulation. Any ideas regarding the problem?
Hello Camilla,
When you say that “running monte carlo simulations using poisson_inv, it severely slows down the simulation” do you mean that running a simulation using =POISSON_INV(RAND(),lambda) is very slow, or something else? Also how slow is it?
Charles
Hi Charles,
Yes, when using POISSON_INV(RAND(),lambda) for specific variables in the simulation, it takes about 5 hours to run a macro compared to 10 minutes without it. I’m also using other inverse distributions (=lognorm.inv etc.) that are already available in excel, but they don’t seem to cause any speed issues.
All the best,
Camilla
Hello Camilla,
Admittedly, the POISSON_INV function was not designed to be used for huge simulations. I have the code for what is supposed to be a faster implementation. I will try it out, and if it is faster I will add it to the next release of the Real Statistics software.
Charles
hello..
I downloaded this software and after following all the troubleshooting to use this in my excel 2007.. the excel file continuously shuts down after ALT+TI and trying to add this package.
what would be the problem plz suggest.
Hello Abida,
Please make sure that you download (and install) the Real Statistics software for Excel 2007. The current version won’t work for Excel 2007. See https://real-statistics.com/free-download/real-statistics-resource-pack/real-statistics-resource-pack-excel-2007/
Charles
Dear prof. Zaiontz,
First of all, thank you very much for your incredible work! Your website and Excel add-in are extremely useful for any scientist or student!
I have a question, or bug report maybe, regarding the Real Statistics Resource Pack (latest version). When it’s enabled in Excel (latest Microsoft 365 version, Windows 10), any action done with a keyboard shortcut like Ctrl+C, Ctrl+B, Ctrl+Z, etc. is very slow, like 1-2 seconds. Basically, everything that has the Ctrl key involved is slow. Ctrl+T is slow, but Alt+N+T (same feature via menu shortcuts) is fast as usual. Is it expected behavior?
Thanks for bringing this to my attention, Valentin. I hadn’t noticed this before, but I do see it when I use Ctrl+C and Ctrl-V. I don’t know what is causing this, but I will try to investigate it further.
Charles
Dear prof
Thanks a lot for your website,
Is there any part for calculating the studentized residual for LOWESS?
I appreciate your help
Hello Fahime,
I guess you can calculate hii values for each local regression. This would mean that there are multiple hii values for the same point. I don’t really know how useful these values would be.
In any case, your question has inspired me to add LOWESS regression to Real Statistics. This will be added shortly. Thanks for the inspiration.
Charles
Fahime,
You can get more information about the hat matrix for LOWESS from Fox’s book Applied Regression Analysis Generalized Linear Models.
Charles
Dear Professor Charles Ziontz.
My name is Thomas. I am writing to you from Brazil. I found your website while searching the internet for reliability tests. I would be extremely grateful if you could give me your opinion about which test would be most appropriate for me to use with my sample data.
I am a master’s student, my research area is performance analysis in soccer. I am testing a method of identifying a certain type of event in women’s soccer games. To demonstrate that the method is reliable, myself and one other researcher are performing a test and retest procedure, separated by 7 days, in which we identify these events in 2 games from the sample. The sample consists of the video recording of the games. The events are identified in binary form, that is, yes (event happened) and no (event did not happen). Therefore, there is no combination in which researcher 1 and researcher 2 did not identify the event. An example of the data arrangement follows below:
Events observer_1 observer_2
1 yes no
2 yes yes
3 no yes
4 yes yes
5 no no
6 yes no
Thus, I would like to know which test or tests, in your opinion, would be most appropriate to obtain inter and intra-observer reliability?
Thank you in advance for your attention and for the excellent content available on your website.
All the best.
Hello Thomas,
Good to communicate with you in Brazil. The other day, I met with a friend from Brazil named Tomaso.
The Real Statistics website describes a number of different reliability tests. See
https://www.real-statistics.com/reliability/
The objective of these is to estimate reliability without having to perform the test twice (as in test/retest).
If you have performed test/retest, you can calculate the correlation between the scores to determine reliability. This is similar to the split-half method, except that instead of two halves, you are repeating the whole test. In this case, you shouldn’t use a Spearman’s correction since you have not reduced the items by half.
Charles
Professor Charles,
Thank you very much for the quick answer. I appreciate the information and your opinion. I will study the tests proposed on your website as well as this test/retest correlation option.
It is very nice to be able to connect even from so far away.
Once again I thank you for your attention and congratulate you for the content of the website.
Best regards.
Thomas
Thomas,
Glad that I could help.
Charles
Dear Charles,
I’m using Real Statistics for almost 2 years. This week it doesn’t work because the follow message: microsoft has blocked macros from running because the source of this file is not trusted.
Could you help me, please?
Many thanks in advance.
Dr. Aline Klassen
Hello Aline,
Thank you for using Real Statistics for the past couple of years.
This issue has repeatedly come up over the past few years. I don’t know why, but for some people a change Microsoft made about trust impacted them many months ago, while for others this issue has only arisen now.
In any case, you can change the trust settings for the file that contains the Real Statistics add-in, Xrealstats.xlam, as described in the Troubleshooting section of https://www.real-statistics.com/free-download/real-statistics-resource-pack/
Charles
Charles,
What pseudorandom number generator (PRNG) is used for the random seed function Randx? For example, like Knuth’s subtractive number generator or the Mersenne Twister algorithm.
Thank you.
Hi Lisa,
RANDX calls the Excel VBA RND function and so uses the seed from that function.
Charles
Can you give me the info I need to download the app. I can’t download on any of computers. I am using WebRoot security
I used this program a few years ago and it is awesome.
Hi Steve,
I am not familiar with WebRoot security. Downloading the app is done by your browser and not by Real Statistics, and so I don’t have any suggestions about how to address this.
Charles
Charles,
first, let me once again, commend you and thank for the great help you are providing to the community!
I have been using XRealStats package for some time. Now I am trying to learn about Kendall’s correlation usage with VBA. I wrote the following code:
===============
Sub kendallXreal()
With ThisWorkbook.Sheets(“ScattterPlots”)
.Range(“H3”) = Application.WorksheetFunction.Kcorrel(.Range(“A1:A20”), .Range(“B1:B20”))
End With
End Sub
===============
Trying to run it I received the following message:
===============
Run-time error ‘438’:
Object doesn’t support this property or method
===============
Will you kindly help me with this error?
Thanks
Paulo,
Thank you for your kind words. For the problem that you are having, see the following webpage
https://www.real-statistics.com/excel-capabilities/calling-real-statistics-functions-in-vba/
Charles
Dear Charles:
this is a very interesting website. Thank you very much. I have been hoping to develop seasonal adjust, for example X-13-ARIMA by FED, and get the seasonal factor, would you please provide such a function in the future?
Thanks for your suggestion. I will add this to the list of potential future enhancements.
Charles
Thank you very much! And I have another question, that is the forecast result is calculated manually, would you please offer a function to get the forecast conveniently? like ARIMA_Coeff ?
Hi, this is a very interesting website. I have been hoping to develop facility with software that will allow me to run generalized linear models. Is there any reasonable way to do that in excel, or is thaqt a bridge too far?
Hello Desmond,
You could certainly develop software to run generalized linear models. I have elected to implement various types of these models in the Real Statistics software.
Charles
Hi! Real Statistics has been very helpful for me, but I’ve recently noticed that it takes a long time (2-3 minutes) to run simple ANOVAs (an n of 8 per condition e.g.). Restarting Excel doesn’t help. Any ideas? Thank you!
Hello Selin,
I just tried running a simple ANOVA and it ran in less than 1 second.
When I have experienced unusually long run times, it is usually because I have other spreadsheets opened which slow things down.
This is especially true if you have one of the Real Statistics examples workbooks open. Better to copy the spreadsheet you need and close the workbook before you experiment.
Charles
Dear,
Thank you very much for the helpful tool!
When trying to run a logistic regression, I receive the following error message ” a run time error has occured. the operation will be aborted”
What could be the reason for this?
Thank you
If you email me an Excel file with your data, I will try to figure out why you received this message.
Charles
Hi!
Thank you for a great website and excel tools, its been a life saver plenty of times!
I just have a quick question regarding running a regression with the real statistics addin vs excel’s native data analysis function. We have a log-long model and with the excel’s regression function (data analysis) we get Multiple R=0.3844, R Square=0.1478, and Adjusted R Square=0.1314 and with the exact same data set we get Multiple R=0.6622, R Square=0.4385, and Adjusted R Square=0.4278 using the multiple linear regression tool in the real statistics addin. Is there something wrong with our model, or what is the reason behind this?
We have just started working on the model and have not checked for the likes of heteroskedasticity and multicollinearity yet.
Hi Simon,
The results should be the same. If you email me an Excel file with your data, I will try to figure out why you are getting different results.
Charles
Hello Sir,
Appreciate Great effort for putting up such educating material in 1 website. I seek your help in getting conducting Cointegration Johansen Test in Excel or Google Sheet.
thank
Robin
Hi Robin,
Real Statistics doesn’t support this test. It supports the Engle-Granger test currently. See
https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/engle-granger-test/
Charles
Hello,
First I love the site and the add-on for excel.
Now to my real question. I have performed a design of experiments that was 4 variable and two factors. I am wanting to produce a prediction equation and thought I knew how to do it by hand but not I am less sure. Could you direct me to any resources to help me get through this?
Thank you in advance
Matt T
Hello Matt,
Thank you for your kind remarks about the website and add-in.
By “4 variable and two factors”, do you mean two factors with replications, each of which has 4 levels?
Charles