Contact Us

Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.

In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.

Charles Zaiontz

My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here

1,126 thoughts on “Contact Us”

ayeshabaig

January 4, 2018 at 5:53 pm

similarly if instead of six metals i want to find out only 1 metal in biscuits taken from nine towns of a city. then will it be appropriate to use single factor anova?
Reply
Kelly

December 23, 2017 at 8:12 am

Dear Charles,

How can I change some input value in your macro? I am trying to change value in {=DescStats(B3:B106,TRUE)} to {=DescStats(B3:B166,TRUE)}. Pressing “Enter” does not work here.

Best regards,
Kelly
Reply
- Kelly
  
  December 23, 2017 at 8:54 am
  
  Dear Charles,
  
  Oh, eventually I realized that this is “Array formulas” and you had once explained the concept.
  
  Mant thanks and Merry Christmas,
  
  Best regards,
  Kelly
  Reply
  - ayeshabaig
    
    January 4, 2018 at 5:49 pm
    
    i want to find out 6 heavy metals in bicuits taken from nine towns of lahore. from each town 6 samples of biscuits were collected. In each sample, i want to find out the concentrations of six heavy metals. should i use two way anova with replication or without replication?
    Reply
Emma

December 7, 2017 at 3:18 pm

Hi Charles,

I am trying to download the software, but every time I click the link it says “Page Cannot be Found”

Could you help me out? I am trying to compare some slopes by this afternoon!

Thanks.
Reply
- Charles
  
  December 7, 2017 at 4:58 pm
  
  Emma,
  Sorry about that. Earlier today I put out a new version of the software and apparently the wrong link was established. I believe this problem has now been corrected and you can now make the download.
  Charles
  Reply
Neil McQuarrie

December 6, 2017 at 1:00 am

Dear Charles,

Thank you very much for this excellent website.

I was wondering if you might consider allowing users to subscribe to a brief email notice when there is new content. If so, I’ll be the first to sign up!

Again, many thanks —

Neil McQuarrie
Reply
- Charles
  
  December 6, 2017 at 9:41 am
  
  Neil,
  I use twitter for this. Please follow at
  @Real1Statistics
  You can also use an RSS feed.
  Charles
  Reply
  - Pablo Caballero
    
    January 26, 2018 at 3:58 pm
    
    Now you have a new folower. Have a nice day
    Reply
Marina

December 1, 2017 at 2:01 pm

Hi Charles,
First of all, I would like to thank you for this excellent website. It provides plenty of useful information, especially for people who find difficulties in dealing with statistics.

I have a question, and I hope to be able to explain it in an understandable way. I’ve performed an experiment without replicates, which consists in treating a cell culture with two different substances (control and treatment), and measure the effect on their growth at different times (1 hour, 7 hours, 12 hours and 24 hours). The effect that I have measured is the percentage of cells in a certain phase of their growth cycle. I would like to know if there is a significant difference in this percentage between control and treated cells.
My main problem is that I don’t have any replicates. So far, I have tried with a two-factor ANOVA test without replication (considering “treatment” and “time” as the two factors), but I am not sure this is the right method for this situation. Are there any other statistical tests that I can use, considering that I don’t have replicates?

Thanks in advance,

Marina
Reply
- Charles
  
  December 2, 2017 at 9:51 am
  
  Marina,
  I am very pleased that you find the website helpful.
  If you want to test both the Treatment and the Time, then you should use Repeated Measures ANOVA with a fixed Treatment factor and a repeated measures Time factor. This specific situation is described on the following webpage:
  Repeated Measures ANOVA with one within subject factor and one between subjects factor
  You will need to read about Repeated Measures ANOVA in general before this webpage will make sense. See
  Repeated Measures ANOVA
  Note that if you only had the Time factor and not the Treatment factor you would use a simpler version of Repeated Measures ANOVA; this would be equivalent to two factor ANOVA without replication where one factor is Time and the other is Subjects.
  Charles
  Reply
kavitha

November 26, 2017 at 1:32 pm

Iam getting #NUM! and # div/0! on my data when i use multiple variables for linear regression, which doesn’t happen when iam doing with solitary variables please help
Reply
- Charles
  
  November 26, 2017 at 4:23 pm
  
  Kavitha,
  If you send me an Excel file with your data and the linear regression that you performed, I will try to figure out what has gone wrong.
  Charles
  Reply
  - Dr. Theodore Peters
    
    November 29, 2017 at 1:26 am
    
    Dr. Zaiontz:
    A very simple question that I have been unable to answer. I have downloaded Realstats for binomial logistic regression since I have a dichotomous dependent variable (enrolled/not enrolled). The space to enter the input range for variables does not separate to show the separate inputs for independent and dependent variable as was shown on input examples I found on the web. Is the column of dependent variables in the same block as the independent variables and if so, should it be the first column or the last column? Thanks, Dr. Peters.
    Reply
    - Charles
      
      November 29, 2017 at 9:52 am
      
      Dr. Peters,
      It is not separated and is the last column. See
      https://real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/
      Charles
      Reply
Andrew Lindsay

October 25, 2017 at 1:45 pm

Thank you Charles. Your site provides a wealth of information, especially for me, a novice and simply not all that good at mathematics.

I have a question, but I have yet to find information about a solution. It’s related to your binomial and beta distribution tutorials. I can use the BINOMDIST to find the sample size needed when I provide the required accuracy, confidence level, and number of errors. But I’d like to go the other way and find the number of allowed errors when I know the required accuracy, required confidence level, and number of trials. All my tests are pass/fail. There are no retries. Is there a similar statistical function or formula I can use to do this? I’ve searched, but I’m not familiar enough with statistical functions to know if I’m just missing it or not.

Thanks in advance,
Andrew
Reply
- Charles
  
  October 25, 2017 at 2:10 pm
  
  Andrew,
  Glad you are getting value from the website.
  There may be a simpler way, but you should be able to use Goal Seek or Solver to solve these sorts of problems. Numerous examples are provided on the website for using these tools for similar type problems.
  Charles
  Reply
  - Andrew Lindsay
    
    October 25, 2017 at 3:07 pm
    
    Thank you Charles. I’ll check out Goal Seek and Solver. I appreciate being pointed in a direction to look.
    Reply
  - Mudassar Mulla
    
    November 24, 2017 at 5:22 pm
    
    Hello Charles,
    pls help me to chose design for following problem
    I want to develop mixture/formulation. A combination of 2 polymers gives me this formulation. poly A- 6% & poly B-2%.
    
    however, for poly A, there is supply issue,so, I am looking for 2nd vendor.
    
    I tried poly A from 2nd vendor. But, to get same formulation, it requires 4% only (compared to 6% of first vendor).
    
    Now, I am wondering if it is possible to have one formulation which will accomodate poly A of both vendors?
    we can change level of poly B to get this new robust formulation.
    pls help. Thanks in advance.
    
    Mudassar
    Reply
    - Charles
      
      November 26, 2017 at 10:58 am
      
      Mudassar,
      Sorry, but I don’t understand the scenario that you are describing sufficiently well to be able to answer your question.
      Charles
      Reply
Paolo De Ninis

October 21, 2017 at 5:47 pm

Hi Charles,

yesterday I found out your excellent RealStat. It impressed me so much: I think it may be very helpful. Great job!

As soon as installed, it worked perfectly.

However, the spreadsheets I saved yesterday today were no longer working. So I have had to reinstall the package and unblock the file RealStat.xlam to make it working. I also added the Addin directory among the Trusted Locations. (Windows 10 64 bit)

Curiously, now it is doing all the calculations but, unfortunately, its very useful “help” in the function wizard doesn’t still work. So it is boring and slows me down a lot to have to look in the website for the documentation of each function parameter.

I tried to reinstall the addin more and more times, but I failed.

Please, can you tell me why it happens?

Paolo
Reply
- Charles
  
  October 21, 2017 at 6:05 pm
  
  Paolo,
  I am pleased that you like RealStats.
  I don’t know why you had to reinstall the software. The help wizard should work in the Excel 2013/2016 version of the software. Yiu can check which version you are using by entering the formula =VER(). I hope to make this capability available in the Excel 2010 version in the next release.
  Charles
  Reply
  - Paolo De Ninis
    
    October 21, 2017 at 9:11 pm
    
    Hi,
    
    the formula =VER() reports “5.2 Excel 2013/2016”, Excel version is 2016. I Have had to reinstall RealStats because something was blocking it: Control+M did not longer work.
    
    Yesterday, for shure, the help worked. Today, after reinstalling, no more. The Macros, on the contrary, work wonderfully.
    
    Paolo
    Reply
Keith Glass

October 13, 2017 at 4:20 pm

Hello Sir,

We are developing a method that requires method precision as a validation parameter. Normally we sample 6 to 10 times from a material and analyze each on to generate and average, SD, and RSD. The %RSD serves as the precision. However, this time the sample is a powder and we look to determine two contaminate powders in the sample. It is difficult to obtain a homogeneous sample at the desired 100 ppm target limit making method precision determination nearly impossible. I could of course spike the 2 components on top of the powder sample for each individual prep and find out the percent recovery, mean recovery, SD and %RSD. But could I then use this data to determine the method precision? In other words for to I relate spike and recovery data back to a target amount in ppm? relative difference to the theoretical for each determination then %RSD of that? Just trying to see if I can use recovery data to determine method precision when the samples are independent.
Reply
- Charles
  
  October 13, 2017 at 9:15 pm
  
  Sorry Keith, but I don’t understand these particular issues well enough to provide a response.
  Charles
  Reply
André Neves

September 22, 2017 at 10:24 pm

Please need a spreasheet about a two stages multilevel model.

I have to know step-bystep multilevel modelling.

Thank you,

André.
Reply
- Charles
  
  September 23, 2017 at 7:33 am
  
  André,
  Sorry, but Real Statistics doesn’t support multilevel modelling as yet. This capability will be added some time in the future.
  Charles
  Reply
Mike

September 7, 2017 at 1:06 pm

Hi Charles,
Thanks for such an excellent blog. Could you please indicate where to find the spreadsheet listed as Appendix 4 in your 2015 Psicologica paper (A robust function to return the cumulative density of non-central F distributions in Microsoft
Office Excel)?
Many thanks.
Reply
- Charles
  
  September 7, 2017 at 1:12 pm
  
  I didn’t write a 2015 Psicologica paper, and so don’t know what spreadsheet you are referring to. In any case, you might find the information that you are looking for on the following webpage:
  https://real-statistics.com/chi-square-and-f-distributions/noncentral-f-distribution/
  Charles
  Reply
  - Mike
    
    September 7, 2017 at 1:19 pm
    
    Hi Charles, I’ve just realised that, my apologies! Thanks for the prompt reply and for the great resources.
    Mike
    Reply
Eduardo

September 6, 2017 at 10:57 pm

Hi, could you please help me? I’m trying to use the COVP function for obtaining variance-covariance matrix, from two columns of data. Also, I’m selecting in Excel an empty 2×2 cels range an then pressing ctrl-shift-enter. I works great!! I work on Windows. The problem is that my girlfriend has downloaded the Mac version of Realstatistics, She is doing exactly the same that I do, but in her case she obtains only errors. Do you have an idea of what is happening? Thanks
Reply
- Charles
  
  September 7, 2017 at 7:49 am
  
  Eduardo,
  Do you know which version of RealStats she is using? You can find this out by entering the formula =VER(). Also what version of Excel for the Mac is she using?
  Charles
  Reply
Anat fraenkel

August 28, 2017 at 6:59 pm

The download isn’t working. Anything you can help with?
Reply
- Charles
  
  August 28, 2017 at 11:22 pm
  
  Anat,
  When you say it isn’t working, can you be more precise as to what sort of problem you are having?
  Charles
  Reply
Tien Li An

August 28, 2017 at 5:21 pm

Hi Charles:
Thank you for your blog, Iam learning a lot.
Right now I am evaluating a method for clinical skill learning (I’m Professor in Dentistry). The method is perfoemred by 25 students, and there are 8 different clinical procedures. And according to the performance of the student in each procedure, the student can be classified as sactisfactory (1) or unsatisfactory (0).
My doubt is the following:
In my case, do you thing that Kruder-Richardson is the best choice? Or may I use Fleiss Kappa interrater agreement?
The following is part of the table, the first column is the number of studentes (1 to 25), the following 8 columns are eight clinical procedures, and 1 mens satisfactory and 0 menas unsatisfactory. Hope that you can help me.

St DV 11 DL 21 MV 16 DV 15 DL 46 DL 41 ML 43 DV 32
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 0
4 1 1 1 1 1 1 1 1
5 1 1 1 1 1 0 1 1
…..
…..
…..
25 1 1 1 1 1 1 1 1

Thank you for your attention.
Best regards
Tien
Reply
- Charles
  
  August 29, 2017 at 7:09 am
  
  Tien,
  Glad to see that the Real Statistics website has been helpful to you.
  Based on additional information you provided me in an email, this doesn’t appear to be a fit for either Kruder-Richardson nor Fleiss Kappa.
  I have tried to give some ideas in my email response.
  Charles
  Reply
Kathleen

August 18, 2017 at 7:30 pm

Cntrl+M is not bringing up Real-Statistics today and I need to run some time series analysis. Was working fine yesterday. Please advise.
Reply
- Charles
  
  August 18, 2017 at 8:19 pm
  
  Kathleen,
  If it was working yesterday, it should work today, unless you made some change. I suggest the you close Excel and then reopen it. If that doesn’t work, restart your computer.
  Charles
  Reply
  - Kathleen
    
    August 22, 2017 at 3:59 pm
    
    Thanks Charles. I viewed some earlier posts and it had something to do with the unblocking option. I tried that and it fixed my issue.
    Reply
    - Charles
      
      August 22, 2017 at 6:01 pm
      
      Kathleen,
      Thanks good to hear.
      Charles
      Reply
Scott Sandman

August 15, 2017 at 4:45 pm

I see that I had not un-checked the “data labels” option when specifying the data ranges.
My bad.
Apologies for the disruption.
Reply
Scott Sandman

August 15, 2017 at 4:41 pm

Hello-
In Excel 2010, the Hotelling T2 output for the Hotel 2 example data does not match the results illustrated in the example.
The output from Real Stats is:
Hotelling T-square Test
Two-samples (equal covariance matrices)
T2 3.149934857
df1 3
df2 32
F 0.988214857
p-value 0.410715235

However, the results from the example Hotel 2 are:
Hotelling T-square Test
Two-samples (equal covariance matrices)
T2 4.116057236
df1 3
df2 34
F 1.295795796
p-value 0.291686665

Please advise. Thank you.
Sandy
Reply
Lengsea

August 9, 2017 at 4:30 am

Hi Charles,

I have downloaded the software and added in in my excel app. However, it disappears after I close my excel app, and I need to do add-in again every time I open my excel. Can you suggest what is the real problems here?

Thanks,
Reply
- Charles
  
  August 9, 2017 at 7:49 am
  
  Hi Lengsea,
  See https://real-statistics.com/appendix/faqs/disappearing-addins-ribbon/
  Charles
  Reply
shaheen

August 9, 2017 at 1:12 am

Many thanks dear Prof,
It is a great help for students,,

regards,
shaheen
Reply
Eduardo

August 2, 2017 at 5:03 am

Hi Charles,

I just wanted to drop a line to say thank you for your work! I used some of your code as well as learnt from your explanations there too.

Also I read that you may want to do some videos. If I may suggest, please make them from a problem solving perspective. There are already many videos on theory explanation but not so many where the theory is extracted while tackling and solving different problems.

Regards,
Eduardo
Reply
- Charles
  
  August 2, 2017 at 5:17 am
  
  Eduardo,
  Thank you very much for your support and your useful suggestion.
  Charles
  Reply
Riife

July 28, 2017 at 1:40 pm

Hello,
Would you kindly suggest an analytical tool for my agric experiment.
I have four replicates of each sample and data for plant height and number of leaves, spanning 7 weeks i.e data is recorded weekly.
i also have data for days to flowering, shoot weight, root weight and fruit yield (all for each sample)
the experiment is to test for the effect of two pathogens on the sample when treated with chemicals. a positive control, neutral control and negative control were also used
Reply
- Charles
  
  July 30, 2017 at 2:22 pm
  
  Riife,
  It is difficult for me to give a precise answer without more detail about the experiment and more importantly about the type of hypotheses you want to test.
  It certainly sounds like some sort of repeated measures Manova type of analysis (Manova since there are multiple dependent variables. Repeated measures if you want to compare weeks).
  Charles
  Reply
Aljo Jose

July 23, 2017 at 8:24 am

Dear Charles,
Thank you very much for this wonderful site. I am new to forecasting and working on machine learning problems now.

I’ve a use case to predict number of bookings for a given week.
Features are – Year, WeekNumber, CountryCode, AgentCode, No.of.bookings.
I understand that ARMA, ARIMA models looks only for one variable (here No.of bookings) and try to forecast the same. But in the given case, other attributes impact no. of bookings.

Which are the approaches I can take to model the forecast ?
Reply
- Charles
  
  July 24, 2017 at 11:40 am
  
  You are talking about panel data, which is an extension to time series data. The Real Statistics software currently doesn’t support this type of data. Eventually it will support this type data.
  Charles
  Reply
Vladimir

July 6, 2017 at 7:10 am

Dear Charles,

Thank you for this very useful resource.
Do you have an online-course of statistics from A to Z?
If no, did you ever thought about making it?
I guess lots of people would like to study statistics based on Excel with you.

Greetings from Russia, Moscow.
All the best,
Vladimir
Reply
- Charles
  
  July 6, 2017 at 7:23 am
  
  Thanks Vladimir for your suggestion. I have been thinking about creating a series of youtube sessions.
  Charles
  Reply
bedru

July 4, 2017 at 12:30 pm

Hy sir i appreciate your concern and i have one question
1. Iam conducting my thesis on Comparative study of Analgesic effectiveness of thoracic paravertebral block (TPVB ) vs. intercostal nerve block vs patient without these two for cholecystectomy under General anesthesia.
since i have three group my statistic tool is ANOVA
my question shall i use one way anova or two way anova ?
thanks sir
Reply
- Charles
  
  July 4, 2017 at 4:40 pm
  
  Sorry, but I don’t understand the scenario that you are describing.
  Charles
  Reply
PVL

June 27, 2017 at 11:09 am

Hi Charles,
Great initiative and the website explains the details in a flow that it is easily understandable.
I am currently trying to study Holt-Winters and apply it to one of my data sets,but i am facing a challenge. My data consists of data points with time stamp interval of 1 minute and complete data set is for 4 months. What should be my value of c? Any suggestions?
Reply
- Charles
  
  June 27, 2017 at 11:57 am
  
  Do you have any seasonality? If not, use Holt’s Trend instead of Holt-Winters.
  Charles
  Reply
  - PVL
    
    July 18, 2017 at 7:52 am
    
    Hi Charles,
    Thanks for your time and effort.
    My data is a level reading from a water tank and the data is for every minute. Considering there is seasonality what should i take the value of c ? would 1440 work?
    Reply
    - Charles
      
      July 18, 2017 at 8:58 am
      
      If the seasonality is daily, then 60 x 24 = 1440 seems appropriate.
      Charles
      Reply
Nasser AT

June 27, 2017 at 7:28 am

Hi M. Charles Zaiontz,
Two or three words : Thanks a lof !
Your web site and all the related stuff are just amazing. Merci pour cette générosité.
Reply
Dave

June 21, 2017 at 6:06 pm

Charles,

When I attempt to use the Shapley function with an array of data (as shown on the Shapley-Owen example page), it provides the decomposition value for the first column in the area but doesn’t present a value for other columns (as described below):

Real Statistics Functions: The Real Statistics Resource Pack contains the following array function. Here R1 is an n × k array containing the X sample data and R2 is an n × 1 array containing the Y sample data.

SHAPLEY(R1, R2): outputs an k × 1 column range containing the R_1^2, R_2^2, …, R_3^k values

For Example 1, the output from the formula SHAPLEY(A4:C8,D4:D8) is shown in range G13:G15 of Figure 1.

Any idea why the respective values for the remaining columns in the array are not showing up in the workbook (like your example)?

Thanks,
Reply
- Charles
  
  June 26, 2017 at 8:51 am
  
  Dave,
  There are three x values in Example 1, x1, x2, x3, one for each column. The output contains three rows. Each row corresponds to one of the three columns.
  If you had four X variables and each variable had 100 rows of data, the output would contain four rows, one for each of the four variables.
  Charles
  Reply
Ferenc Marincs

June 18, 2017 at 4:22 pm

Dear Charles,

I have concentration data for about 50 metabolites from plants under 3 invariables: temperature (5 and 15 degrees), time (1, 7 and 10 days) and light (white and far red). What would be the best statistics to analyse these data to know that which of the invariables or their interaction influence the metabolite concentration.
Many thanks for your help in advance.
Best regards
Ferenc Marincs
Reply
- Charles
  
  June 18, 2017 at 4:48 pm
  
  Ferenc,
  It depends on the details, but some version of ANOVA seems likely. Given that you have a Time factor, it is likely that you need repeated measures ANOVA (or MANOVA). See the following webpage Repeated Measures ANOVA.
  Charles
  Reply
Mathieu

June 10, 2017 at 6:09 pm

Hello Charles, My excel crashes when i performed a kruskal Wallis test. Now Every other test i did displays a #N/A. I tried to restore it but i have the same problem. Have you a solution or must i do the tests again ?
Reply
- Charles
  
  June 14, 2017 at 5:20 pm
  
  Mathieu,
  I can’t think of any reason why Excel should crash in this case, nor why you should get #N/A.
  If you send me an Excel file with your data and the Kruskal-Wallis test results (if available), then I will try to figure out what is going on. You can send the file to my email address as shown at Contact Us.
  Charles
  Reply
Amer Nassar

June 10, 2017 at 8:47 am

Dear Charles,
I’m sorry, if my question was not clear.
I would like to ask if I can pool the data set from two different surveys, taking into account that the first survey population is 328 staff members of the hospital as (independent variable: six-sigma methodology). The second surveys population is the inpatients in the same hospital, numbering 540 patients as (dependent variable: patient satisfaction). Both surveys consist of five dimensions (define, measure, analyze, improve, control). While the survey of workers with 68 questions and the survey of patients with 36 questions. Both surveys were designed according to Likert scale.
My question is: Can I collect the two data-files from both surveys into one dat- file and calculate the relationship (correlation) or regression between the independent variable (six-sigma methodology) and dependent variable (patient satisfaction)?
Reply
- Charles
  
  June 10, 2017 at 2:12 pm
  
  Amer,
  You question has been very clear, and my answer is the same. You can certainly collect the the data from the two data sets into one file, but I don’t see how you can calculate a correlation. If each staff member was responsible for one of the patients, then you would have the one-to-one relationship necessary to calculate a correlation, but this is not your situation.
  You can compare the survey results from the two groups (e.g. by using multiple t tests or, better yet, using Hotelling’s T-square test), but this is different from calculating a correlation.
  Charles
  Reply
  - Amer Nassar
    
    June 11, 2017 at 5:21 pm
    
    Dear Charles,
    Thanks you very much for your effort.
    Best regards,
    Amer
    Reply
Haneen

June 7, 2017 at 11:38 am

Hi Charles,

I have a question regarding the Holt- Winters method for forecasting using the realstats excel add-in. What do you exactly mean by “number of seasons” in the dialog box? My data is monthly (16 months), if I insert “4” as the number of seasons does that mean that it will regard every (16/4= 4) months as a season? Or does this method not work when data is monthly? Should I have my data prepared in a certain way before forecasting?

Thank you in advance.
Reply
- Charles
  
  June 7, 2017 at 3:18 pm
  
  Haneen,
  If your data is monthly, then enter 12.
  Charles
  Reply
  - Haneen
    
    June 8, 2017 at 6:31 am
    
    Hi,
    Thank you for your reply. Is there an alternative to entering 12? When I enter 12, the MAE and MSE increase greatly (compared to other forecasting methods).
    Reply
    - Charles
      
      June 8, 2017 at 9:06 am
      
      Haneen,
      I don’t know what webpage you are referring to.
      Charles
      Reply
      - Haneen
        
        June 11, 2017 at 7:07 am
        
        This one
        
        https://real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-winters-method/
      - Charles
        
        June 11, 2017 at 8:59 am
        
        Haneen,
        You can use any value you want, but if your seasonality is monthly, then the correct value is 12.
        Charle
Amer Nassar

June 2, 2017 at 11:42 pm

How to associate data from two different surveys coming from two different populations?
I want to associate between staff perception of hospital quality with patient satisfaction about hospital services quality. Two different surveys were used for the 2 different population. The only common thing is that they were collected from the same hospitals. Other meaning: How to find the relation (correlation or regression) between the independent variable (Six Sigma methodology: from the first survey) and between the dependent variable (patient satisfaction: from the second survey)? I have two study populations: the first is the in-patient client at the Hospital and the second population is health providers at the same
the first survey was to collect the data from workers (health providers: 328 persons) aiming to measure the level of using Six Sigma methodology at the hospitals. The survey consisted of (5) domains (DMAIC) that is (define, measure, analyze, improve, control) and total of (68) statements by Likert scale (5 degrees: strongly agree, agree, neutral, disagree, strongly disagree).
the second survey was to collect the data from in-patients client (540 patients) aiming to measure the level of patient satisfaction at the hospitals. The survey consisted of (5) domains (DMAIC) that is (define, measure, analyze, improve, control) and total of (36) statements by Likert scale (5 degrees: strongly agree, agree, neutral, disagree, strongly disagree).
Actually, I collected the data from the two populations, but I don’t know how to correlate between them as I mentioned above.
Reply
- Charles
  
  June 4, 2017 at 10:23 am
  
  Amer,
  Since the populations are different, I can’t think of a way to calculate a correlation.
  If they are measuring the same thing, then you can test whether the measurements differ.
  Charles
  Reply
  - Amer Nassar
    
    June 5, 2017 at 3:37 am
    
    Dear Charles,
    Thanks for your effort, I’ll like to ask, no way to calculate the correlation, even thought the data collected form the patients and health worker form the same hospital and during the same period of time and the basic 5 domains are similar in both surveys.
    Reply
    - Charles
      
      June 6, 2017 at 7:21 am
      
      Amer,
      You might be able to calculate the correlation in this case. With multiple variables, you essentially run linear regression and look at the Multiple Correlation value output from the Regression data analysis tool.
      Charles
      Reply
      - Amer Nassar
        
        June 6, 2017 at 11:24 pm
        
        Dear Charles,
        Thanks again, I’ld like if you could please tell me the criteria which l’ll use is correct.
        Sine I have 2 populations, one is the workers (328 persons) and other is the patients (540 patients), every population is in a separated date file, so I’ll do the following:
        First: combining the two data files into one data file.
        Second: As I’ve two kinds of observations, one form worker at the hospital and the other form the in-patient from the same hospital, also every observation has the same 5 domains, and every domain has several questions (NB: The workers’ questions differ from the patients’ questions), so to overcome this difference I’ll calculate the mean for each of the 5 domains for every observation and as well as the total score.
        Third: Run the regression to find the correlation between the data (means) of workers and that’s for patients, but I don’t know which type of regression must I use because the 2 samples’ (workers and patients) are not equal in size (the workers are 328, and the patients are 540).
        So please Dr. Charles, tell me if the criteria that I mentioned above is right or not and your advice to me.
        My best regards,
        Amer
      - Charles
        
        June 8, 2017 at 9:36 am
        
        Amer,
        It doesn’t seem like you can use regression or correlation in this case.
        Without using any statistical terms, what real-world problem are you trying to solve?
        Charles
Kelly

May 31, 2017 at 1:48 am

Dear Charles,

From where can I find the information about Granger-Causality Test, Impulse Responses, and Forecast Error Decompositions at all?

Best regards,
Kelly
Reply
- Charles
  
  May 31, 2017 at 9:23 am
  
  Kelly,
  These are topics that I haven’t yet covered on the Real Statistics website. I would imagine that most good books about time series analysis or econometrics cover them – e.g. Gujarati and Porter’s Basic Econometrics.
  Charles
  Reply
  - Kelly
    
    June 2, 2017 at 11:39 am
    
    Dear Charles,
    
    Thank you very much for the books.
    
    Best regards,
    Kelly
    Reply
Kelly

May 25, 2017 at 3:37 pm

Dear Charles,

In ADFTEST(A3:A22,TRUE,-1), the -1 means using the Schwert estimate for the maximum number of lags. In ADFTEST(A3:A22,TRUE,0), what is the meaning of 0 here?

Best regards,
Kelly
Reply
- Charles
  
  May 27, 2017 at 12:25 pm
  
  Kelly,
  Lag = 0. See the following webpage for a more complete explanation:
  https://real-statistics.com/time-series-analysis/autoregressive-processes/augmented-dickey-fuller-test/
  Charles
  Reply
  - Kelly
    
    May 28, 2017 at 2:13 pm
    
    Dear Charles,
    
    Yes, thank you very much for the link.
    
    Best regards,
    Kelly
    Reply
Heba

May 15, 2017 at 8:38 pm

Hello Sir,

Although I’m using SPSS for my statistics, I still find your website very useful.
Regarding “linear regression”, are the results still true if my variables are not normally distributed according to K-S and Shapiro-Wilk’s test? In other words, does linear regression require the assumption of normality even though the variables are linearly correlated?
After running the test, all other assumptions (except for heteroscedasticity) were met including independence of observations, normal probability plot of residuals and the model was a good fit with an R-squared value of 0.7346.
I have run “Robust Standard Errors” to overcome heteroscedasticity. Does this make sense? I mean would it correct for both normality and heteroscedasticity? If so how should I write/ report the results of a robust standard errors test?

Thank you
Reply
- Charles
  
  May 23, 2017 at 12:16 pm
  
  Heba,
  Normality is not a strict requirement for linear regression. It really depends on how you plan to use the model. See the following webpages:
  http://people.duke.edu/~rnau/testing.htm
  https://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression
  If you have heteroscedasticity, then you can use robust standard errors.
  Charles
  Reply
  - Heba
    
    June 6, 2017 at 9:52 am
    
    Thank you!
    In your opinion, which would you recommend; bootstrapping, weighted least squares or robust standard errors to overcome heteroscedasticity?
    Reply
    - Charles
      
      June 6, 2017 at 11:29 am
      
      Heba,
      The easiest is probably robust standard errors, and so I would use that approach.
      Charles
      Reply
MUKTA CHANDRA

May 15, 2017 at 3:22 pm

Respected Sir
I have downloaded real stat for Excel 2010. after installation it asks for password. Please help me how I may get password.
Reply
- Charles
  
  May 16, 2017 at 8:00 am
  
  Mukta,
  See https://real-statistics.com/appendix/faqs/password-prompt/
  Charles
  Reply
Miroslav

May 10, 2017 at 10:59 pm

Good evening,
at first I should write, that is great, that you are doing all these things, soft, web, discussion… You have my admiration fo this. But I am also asking for a help. I have two parametres on measured tests. One is position (-5,-4,…0,+1,+2,…,+5). For some positions I have results of test (we could say second parameter), for other position I would like to find result (presumption, estimation, premise) without test, only with knowledge of neibourghs, knowledge of others. For example -5~5.1, -4~4.2, -3~???, -2~5.5, -1~4.5, 0~4.2, +1~5.5, +2~5.5, +3~5.2, +4~3.9, +5~4.5. Is possible to compute position”-3″? Please, how to do it? Thank you very much for your help.
Miroslav
Reply
- Charles
  
  May 11, 2017 at 7:36 am
  
  Miroslav,
  If I am understanding the situation properly, you are looking to estimate missing data. There are many techniques for doing this. See the following webpage for details: https://real-statistics.com/handling-missing-data/
  Charles
  Reply
  - Miroslav
    
    May 14, 2017 at 9:10 pm
    
    Good evening,
    thank you very much for your answer, but I did mistake, it looks. Data, which are used are measured from tests on complicated sample (values corresponding to deformations on the positions -5,-4,…0,1,…5 on the car). For some positions I have results of test, for other position I would like to find result (presumption, estimation, premise) without test, only with knowledge of neibourghs, knowledge of others. So, if it is like this, it is not Normal distribution, is it? In that case, maybe better way is just “trend” in excel. But I am not sure, if it is enough 6th degree of polynomial function for this tests. Thank you very much,
    best regards
    Miroslav
    Reply
    - Charles
      
      May 15, 2017 at 1:17 pm
      
      Miroslav,
      If you believe that the data follows a particular 6th degree polynomial y = p(x), then simply substitute the x value x0 that you want and calculate p(x0) — i.e. the trend, as you have suggested. This is essentially how one of the simplest missing values approaches works.
      Charles
      Reply
      - Miroslav
        
        July 5, 2017 at 2:48 pm
        
        Good afternoon Charles,
        I am sorry for late reply. I have to thank you very much for your answers. This is reason, why I am writing: thak you.
        Have a nice day 🙂
        best regards
        Miroslav
Kelly

May 10, 2017 at 10:48 pm

Dear Charles,

I am very confused with the ADF and cointegration issue.

Am I corrected that if I have some time series variables, I have to first make sure they are stationary by applying ADF on the variables. Once they are stationary, I then can build a model using those variables as dependent and independent variables. Then, I have to perform the cointegration test on the model by applying ADF on the residuals.

Is my understanding right?

Many thanks and best regards,
Kelly
Reply
- Charles
  
  May 11, 2017 at 7:48 am
  
  Kelly,
  Conducting an ADF test doesn’t make a time series stationary, it merely tests whether the time series is stationary. If the time series is not stationary, you need to do something else to create a related stationary time series — e.g. differencing. In this case you can now build a model for the stationary time series which you can use, for example, to make predictions. The good news is that form these prediction you can also make predictions about the original non-stationary time series.
  Charles
  Reply
  - Kelly
    
    May 13, 2017 at 10:35 pm
    
    Dear Charles,
    
    Thank you for your response. If I have 4 original variables in which 2 are stationary and 2 are nonstationary, I then need to transform them into 4 stationary variables before building a time series model from these stationary variables that are transformed from the original variables.
    
    Since not all of the orginal variables are nonstationary at the beginning, do I still have to perform a cointegration test?
    
    Best regards,
    Kelly
    Reply
    - Charles
      
      May 14, 2017 at 6:22 pm
      
      Kelly, Perhaps. See the following from Wikipedia:
      
      The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).[3] The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.
      
      For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high R-squared relationship (suggesting high explanatory power on Fiji’s consumption from Afghanistan’s GNP). This is called spurious regression. To be more mathematically precise, two integrated I(1) series which are statistically independent may nonetheless show a significant correlation; this phenomenon is called spurious correlation.
      
      Charles
      Reply
      - Kelly
        
        May 15, 2017 at 7:15 am
        
        Dear Charles,
        
        Thank you very much for your explanation.
        
        Best regards,
        Kelly
Molly Harris

May 9, 2017 at 4:27 pm

Hi Charles

Very helpful website but I have a query which I hope you can help me with. I am investigating the effects of loneliness on maladaptive behaviours and after computing a Pearson’s correlation, a non-significant relationship was found. I wanted to analyse my data further and see if there were any gender differences or differences regarding age. Is is appropriate to just compare the Pearson’s scores I have ran for the groups even though the sample sizes are different?(I have 34 males and 12 females). Or do I need to use a Fisher’s conversion to appropriately compare them.

Thank you in advance
Molly
Reply
- Charles
  
  May 9, 2017 at 5:00 pm
  
  Molly,
  I don’t see how you can calculate a Pearson’s correlation when the sample sizes are different. Perhaps you need to use a t test in this case. These subjects are covered on the Real Statistics website.
  Charles
  Reply
Jonathan

May 4, 2017 at 2:50 am

Charles,

We have been using the JENKS formulas in our supply chain to help rank items.

We come up with a weighted ranking value for each of our SKUs. Then, we take the JENKS formula against this value to determine rank distribution. This worked perfectly last time returning 10 ranks all populated with data.

However, this year when we ran it, sometimes we are only getting 7 to 9 ranks. When we use JENKS, we are specifying the number of ranks we are wanting returned.

Do you have any thoughts as to why this would happen?

Thanks for any input.
Reply
- Charles
  
  May 4, 2017 at 5:51 pm
  
  Jonathan,
  If you send me an Excel file with your data and the results of the JENKS function, I will try to figure out what is happening.
  Charles
  Reply
Des

April 28, 2017 at 5:30 pm

Hi Charles, great to land on your website! I have a small data set (n<25). I run multiple regression for different combinations of 4 variables and single factor. With variable A, it shows multiple R<15% and Rsquare50%, Rsquare>30%, coefficient =0.004. I’m considering to choose A & C. what do you think? how would you interpret these? thanks so much!
Reply
- Charles
  
  April 29, 2017 at 7:05 am
  
  Des,
  I would need additional information to answer your question. See the regression portion of the Real Statistics website
  https://real-statistics.com/multiple-regression/
  Charles
  Reply
Sab

April 28, 2017 at 1:35 pm

Hi Charles
I’ve got some A&E data that need analyzing and would appreciate some advice. The data I have are categorical; address area, age group, trust attended, referral mode, discharge destination. I have done chi-analysis on the data but I want to do analysis to predict the discharge destination of an individual from a certain area, of a certain age group, attended a certain trust by a certain mode. A probability tree would be huge, is there anything else that can be done?

Thanks
Sab
Reply
- Charles
  
  April 28, 2017 at 5:00 pm
  
  Sab,
  Perhaps binary logistic regression or multinomial logistic regression.
  Charles
  Reply
Luis A. Afonso

April 20, 2017 at 10:35 am

Charles says:
December 6, 2015 at 9:13 am

It turns out that Statistics is not just “science” but also “art”.
Charles
_________
Exactly on the target, Charles!
In fact Statistics in practice lives “UII”, under insufficient information. It´s the heel, and the pleasure, of our lives,
Art disguised as a Mathematical Branch let me say.
Inductive instead of deductive.sts of hypothesis via necessary conditions, not at all sufficient ones.
Reply
Steve

April 14, 2017 at 1:37 pm

Good evening Dr. Charles.
I’m an undergraduate student from a university in Indonesia. If you don’t mind, I would like to ask a question about normal distribution.

If I have data of a pre-test, an immediate post-test, and a delayed post-test, do all my tests have to be normally distributed? Or is it just my pre-test which needs to be normally distributed?

I would be really glad if you would answer my simple question.
Thank you in advance.
Steve
Reply
- Charles
  
  April 14, 2017 at 10:52 pm
  
  Steve,
  Assuming that you are referring to a repeated measures Anova, then you need make sure that the pre-test, post-test and delayed post-test are all normally distributed. Of course, Anova is pretty robust to violations of this assumption, especially if each group is reasonably symmetric.
  Charles
  Reply
  - Steve
    
    April 15, 2017 at 1:08 am
    
    Thank you very much for your reply.
    
    Sorry for not mentioning it before.
    I’m actually running a univariate ANCOVA test, with the pre-test as a covariate. Then, I guess I also need normal distribution for all my tests?
    
    If it is robust to violation of the assumption of normality, does it mean I can use the ANCOVA test even though the Kolmogorov and Shapiro tests are significant at .000? And the skewness and kurtosisness of the immediate post-test are significant?
    
    Thank you!
    Reply
    - Charles
      
      April 18, 2017 at 7:00 pm
      
      Normality is also a requirement for ANCOVA.
      Charles
      Reply
      - Steve
        
        April 25, 2017 at 6:02 am
        
        Okay, thank you!
cedric

April 8, 2017 at 2:41 pm

I have a SB coefficient of 0.502 and a guttman split-half coefficient of 0.498. Does it means my data has a low reliability?
Reply
- Charles
  
  April 8, 2017 at 3:53 pm
  
  Cedric,
  There isn’t universal agreement about what is high and low, but most people would say that this is not sufficiently high reliability.
  Charles
  Reply
Mayank

March 31, 2017 at 10:39 am

Is there any video tutorial available on performing ADF test on excel using realstat pack ?
Reply
- Charles
  
  March 31, 2017 at 1:06 pm
  
  Mayank,
  I don’t know of such a video. Do you have some questions about how to perform the ADF test using the RealStats software?
  Charles
  Reply
sujit kumar biswas

March 29, 2017 at 4:56 am

Dear Sir
I have started to reading this article. I think it may be has instead of have. I really don’t know whether it’s have or has or having.
”Four fields are available for testing with each field have fairly uniform characteristics”

Sincerely
Sujit
Reply
- Charles
  
  March 29, 2017 at 10:03 am
  
  Sujit,
  It should be “having”. Thanks for identifying this error. I have now corrected the webpage.
  I appreciate your help in improving the website.
  Charles
  Reply
Sergio

March 27, 2017 at 1:19 am

Hi Charles,

Please give me your thoughts on the following.
After studying further onto the previous matter, I came to the conclusion that a one-way ANOVA using contrasts for planned comparisons would be appropriate. That way I can identify which groups are significantly different from the control group.
My problem is how can I identify the proper materials, since I’m interested in the materials that are Not significantly different from the original.
The original material has a mean of 7.5 but the minimum observed is 7.2 and the maximum 7.7.
If I use a standard alpha of 0.05 or 0.01, even with the Dunn/Sidak or Bonferroni corrections some materials with a mean of 7.6 are being considered significantly different when it is known that they shouldn’t.
If I do a one-sample T-test on the control material, and use the p-value obtained as my alpha I get much more reliable results that correlate with the observations of materials that are comparable to the control and the ones that are not.
With this method my alpha is extremely small (3.73446E-51), but it still identifies the materials that are actually different, and more importantly it does not make everything significantly different as was the case when using the standard p-values.
Please give me your thoughts on if this would be an acceptable practice.
Thanks,
Sergio
Reply
- Charles
  
  March 29, 2017 at 8:51 am
  
  Sergio,
  You may need to use Dunnett’s Test. See my response to your email for more details.
  Charles
  Reply
Bo

March 23, 2017 at 1:56 am

Dear Sir,
I love this website so much. I do not have a statistics background but I am trying to learn. I am currently doing a research that needs statistics knowledge. Could you give me some advice on my research below?

I am trying to use multiple regression method to explore impacts on housing price in a city. The city has a lake and has a smelly river (these two waterbodies are not adjacent to each other). The research purpose is to gauge housing value increase given that the smelly river been converted to a lake by building a barrage.

I have got housing transaction data that occurred in the area within 1000m buffer from the river and the lake, from 2014 to 2017 of around 750 records. In the database columns refer to different attributes of transactions. The attributes may include ‘property sale price’, ‘name of suburb’, ‘sale date’, ‘land area’, ‘bedroom number’, etc.

By converting the ‘name of suburb’, I can get another two attributes: 1.the proximate distance of the property to CBD ‘Dist_ CBD’; and 2. whether or not the property is located besides the lake or the smelly river ‘lake_nolake’.

Now in the regression model, Y is per sqm housing transaction price. Xs are,
‘Dist_ CBD’ (I measured by meter);
‘lake_nolake’ (if the property is lake-side, I code it 1, if the property is river-side, I code it 0)
‘sale date’ (I converted the date to number);
‘land area’ (I keep the original unit of sqm);
‘bedroom number’ (I keep the original unit of number);

Then I ran the regression model.
Regression Analysis

OVERALL FIT
Multiple R 0.434978 AIC 7900.829
R Square 0.189206 AICc 7900.882
Adjusted R Square 0.187063 SBC 7914.729
Standard Error 180.5385
Observations 760

ANOVA Alpha 0.05
df SS MS F p-value sig
Regression 2 5757830 2878915 88.32612 3.33E-35 yes
Residual 757 24673773 32594.15
Total 759 30431603

coeff std err t stat p-value lower upper vif
Intercept 497.0258 11.62045 42.77164 3.7E-204 474.2137 519.838
Dist_CBD -20.5369 1.712126 -11.995 1.8E-30 -23.898 -17.1758 1.237728
Lake_NoLake 3.5051 31.53518 0.111149 0.911528 -58.4017 65.4119 1.237728

Then the question would be:
How to interpret the ‘lake-nolake’ coefficient of ‘3.5051’? Can I say the per sqm price of lake-side properties tends to be higher than that of river-side properties by $3.5051?

Thank you sir.
Reply
- Charles
  
  March 23, 2017 at 4:59 pm
  
  Bo,
  Based on the regression model that you have created, you are correct in saying that the model predicts that the per sqm price of lake-side properties is $3.5051 higher than river-side properties.
  Charles
  Reply
  - Bo
    
    March 23, 2017 at 11:08 pm
    
    Thank you very much for your prompt reply.
    The R square of the model is only 0.189 which would mean the model can only explain small part of the situation.
    May I ask how to convince the local residence the reliability of the $3.5051?
    Reply
    - Charles
      
      March 24, 2017 at 7:38 am
      
      Bo,
      I don’t understand your question.
      Charles
      Reply
      - Bo
        
        March 28, 2017 at 11:48 pm
        
        Sorry for my unclear question. I would like to show you the conflict results between multiple regression and single regression using same database and wish to seek your advice.
        
        In my comment on March 23, 2017, we can say that the per sqm price of lake-side properties tend to be $3.5051 higher than river-side properties.
        
        However, if I run a single regression using the same data (y=per sqm housing transaction price, x=lake_nolake), the regression result is:
        y=162.27x+383.5 R square= 0.0351 which is low. This single regression result may tell that the per sqm price of lake-side properties tend to be $162 lower than river-side properties.
        
        Thank you for your help.
Doug

March 22, 2017 at 12:25 am

I believe there is an error on your page explaining how to execute Holt-Winters method in Excel. For “Example 1” you say to put “=E$22*(C9/D9)+(1-E$22)*F5 in cell F8” however I believe it should be “=E$22*(C8/D8)+(1-E$22)*F4 in cell F8”.

Thank you for this site, the information is extremely helpful!!

Breadcrumbs for ref: /time-series-analysis/basic-time-series-forecasting/holt-winters-method/
Reply
- Charles
  
  March 22, 2017 at 9:01 am
  
  Doug,
  Thanks for catching this error. I have now made the change on the referenced webpage.
  I really appreciate your help in making the website better.
  Charles
  Reply

1,126 thoughts on “Contact Us”

Leave a Comment Cancel reply