Contact Us

Welcome to the Real Statistics Using Excel website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.

In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.

Charles Zaiontz

My email: czaiontz@gmail.com, info@real-statistics.com
My profile: click here 

1,067 thoughts on “Contact Us”

  1. Hi Charles,

    Thanks again for your great site.

    I have a quick question: I have a data set of 8 data points and I want to test the prob of the 8th data point occurring given the previous 7. I model a ols linear fit to the 7 points and calculate the residuals and calculate their SE=45.26. I then project that linear model to point 8 and calculate its residual based on that linear fit resid=888. I then calculate the t stat as 19.6. I want magnitude difference so 2 tail and get p=1.13e-06 for df=6 (as I want to test prob given the distribution of the 7).

    Is this how you would do it and do you’ve any comment on how I done the calculation?

    Thanks,
    Joe

    Reply
    • I should clarify: I’m just not certain if the df=6 or 7 in this case as I haven’t included the 8th data point in the linear model estimation which only had 7 points. Residuals on 7 are less than 178 (4 less than 100) but residual on 8th is 888 which is much greater. I simply want to calculate its probability given the previous 7. I used your t test for it but cannot put in one data point for the ‘second’ dataset and have to put in 2 (repeating 8th point). Is this the right way to do it?

      Also if you have any other suggestions of how to estimate the probability of the appearance of that 8th data point given the other 7?

      Thanks,
      Joe

      Reply
      • Hi Charles,

        Thanks for the response. What I’m looking for is the probability of a data point as large as the 8th given the previous 7 that I have.

        Yes, I think really what I need to use are the prediction intervals. I can calculate those for the 7 points and then see what the probability is for each of them to be as large as the 8th data point by reducing the alpha to meet the 8th data point. The overall probability then of any data point being as large as the 8th data point is that at least one of them is that large using the binomial theorem. That makes sense to me anyway!

        Regards,
        Joe

        Reply
  2. Charles –

    I expressed this to your personally via email but I wanted to share the feedback with the broader general public.

    The resources, tools, and explanations you provide on this website are invaluable. As someone with a deep curiosity and immense passion to learn but without a formal training in statistics, you and your Real-Statistics website have opened up my eyes to a whole new world and have propelled me to previously unimaginable levels of statistical granularity and insight.

    THANK YOU for all that you have done and continue to do. You have empowered me to pursue interests I once thought impossible. THANK YOU.

    Andrew

    Reply
    • Thank you very much, Andrew.
      As I have expressed to you via email, it has been my pleasure to give something back after all that others have done for me over the years. I am pleased and honored by your kind words and support.
      Charles

      Reply
  3. Dear Charles

    It is great for your introduce and share.
    And for time series forecast , the model of Holt-winters, I am not sure the level . As know that , three factor be included , there were Level , Trend and Seasonality. And the first number of Level , I saw that in the excel , calculate as Acutal – Seasonality
    As Excel sample , 10 Q4 acutual is 25 , and seasonilty is 10.75 ,trend is 1.875 so level is 25-10.75=14.25, but why not consider the trend , why not level is 25-10.75-1.875 = 12.375?

    Could you share the idea?
    Thank you very much

    Reply
  4. Dear Charles,

    This add-in is a powerful tool, so cool! But I got a problem…

    I tried tens of times on the holt forecast but always run into an error. Winter forecast works. I also tried your example sheet, which holt forecast running went into an error too. My excel version is 365 on the Windows platform.

    the set is {8000, 13000, 23000, 34000, 10000, 18000, 23000, 38000, 12000, 13000, 32000, 41000} # of seasons 4, #forecast seasons 4, alpha 0.1, beta 0.2, initialize trend yes

    Reply
    • I found something interesting. if I output the result to any cell in the first column or a new sheet, it will crash. But if I output the result to any other column, it will work.

      Reply
    • Hello Rodrigo,
      When does it crash? During installation or when you open Excel?
      What sort of error message, if any, do you receive?
      Have you looked at the Troubleshooting suggestion (on the webpage from where you downloaded the Real Statistics software)?
      Charles

      Reply
  5. Dear Dr. Zaiontz,

    I am unable to download the software from the links on your webpage – it opens another window with the same page.

    Is there something I am not doing right?

    Thanks,

    Rajendra

    Reply
  6. Dear Sir,
    Can I do Cochran-Armitage Test for Trend with Real Statistics Resource Pack? if so, could you please show me how to do it?

    Thank you very much.

    Reply
  7. Hello Dr Zaiontz
    A simple note of appreciation for your spectacular work here. Easy to use, chock full of useful tools, and illustrative explanation and writing. Truly amazing, thanks again.
    Mike Overturf

    Reply
  8. Hello
    I have Windows 11 and use Excel 2019. I have downloaded the software from your webside for Windows and Excel 2019. If I want to run the add-in in Excel it warns with “Run time error 5”, so I cannot use it.
    Best regards
    Vico

    Reply
      • Hello Charles.
        Yes, the model I am evaluating is: yi = C + ei
        I performed a nonparametric runs test as a check for serial correlation and found none. I then considered evaluating positive serial correlation using the Durbin-Watson test by evaluating ei. I have never done so without a regressor but thought it was a curious exercise because I have never seen DW tables with no regressors.

        I have auto regressed the errors using AR(1) and found that rho is insignificant via t-test. However, the reported Durbin-Watson value that software reports based on the AR(1) are based on the errors with the effect of rho removed, even if it is insignificant, this does not reflect the original errors where the constant is subtracted for the individual observations. As such, the Durbin-Watson value approaches 2 (ie. 1.90 to 1.97).

        Thank you and best regards,
        Andrew

        Reply
  9. Hi Dr.

    Are there any function to figure out the best fit modeling parameters for ARIMA/SARIMA functions?

    My question is: instead of trying (p,q,d) values and looking to the tests, does the system have any specific function for a best guess for (p,q and d)?

    Appreciatte.

    Reply
  10. Hi Charles,
    When we run a post hoc test after an ANOVA or a non-parametric analysis, is it possible within Real-Statistics to automatically assign letters to different groups?
    Thanks for your splendid work.

    Reply
    • Hi Luis,
      Currently, Real Statistics assigns the values Group 1, Group 2, etc. when no group headings are specified. Is there a problem with this?
      You could add the labels A, B, etc. yourself.
      Charles

      Reply
      • Hi Charles,
        I am not refering to the groups names, but to the assignment of different letters to groups that according to the test are diferent, as shown below
        Group 1: a
        Group 2: ab
        Group 3: abc
        Group 4: bc
        Group 5: bcd
        and so on.
        Right now what we get is the information of all pairs of groups whose means are or are not different (with a cutoof p-value) as shown below
        Group 1 versus Group 2, p= 0.06
        Group1 vs Group 3, p= 0. 1
        Group 1 vs Group 4, p= 0.02
        Group 1 vs Group 5, p= 0.0001
        and so on
        Using letters is a more friendly way of presenting the results

        Reply
  11. Hi Charles, thank you for your previous help. I have another question:

    I have a problem in which I have categorical marks of students in a questionnaire, but questions are evaluated differently Q1 from {0,..,3} Q2 from {0,…4} and so on. How can I address this issue using the Rasch Polytomous model?

    Reply
    • Ciro,
      I don’t know how to deal with this situation.
      Perhaps you can multiply the scores for Q1 by 4 and the scores for Q2 by 3. This will make the scores on both questions go from 0 to 12.
      Charles

      Reply
      • Of course!

        I will find the minimum common multiple of all maximum mark questions and new categories will appear. Every question will have many categories not appearing but that doesnt matter.

        thanks.

        Reply
  12. Hi Charles

    This website has helped me a lot. I just had to follow the implementation steps of “Building a Polytomous Model” and in a few hours, I had it done. I have a problem. the first iteration gives me the same answer in each table including the residual, but after that, the residual table starts oscillating, and the total residual start growing until it blows up. I might have a bug in my code, I wonder if it is possible to have the tables of the second iteration of this example, this could help me to find where the bug is. Thanks 🙂

    Reply
  13. Dear Mr. Charles Zaiontz,

    First of all, thank you so much for all the information and the amazing tools. I’ve really learnt a lot following your clear interpretations, and the specific examples provided in the website saved me from the original abstract and deep notions.

    I just had a quick question confusing me, as I write my paper. That is, the mean difference between the two samples (one being 17 data points, the other 23 data points) are large, but the result of Monte Carlo simulation (resampling) turns out to be insignificant. I think this might to be related to the high variability within each sample’s data. What do you think about this? If this is the case, how should I argue that there exists high variability within the dataset, are there any formulas for me to use?

    One more thing is that the resampling in the Real Statistics Using Excel software is designed for two independent samples. And I am wondering whether it could be improved so that it can deal with three or more independent samples.

    So look forward to your reply.

    Reply
  14. Dear Mr. Charles Zaiontz,
    I am not so expert in statistics.
    I need some help with my data.
    The reviewer suggested using two-way ANOVA for my data but the data is not normally distributed.
    In this case, the reviewer suggested using the permutation variant of ANOVA.
    Now I am not sure what exactly I need to do because it is not possible to do non-parametric two-way ANOVA in the graph pad prism which is available in my lab.
    I would like to know which other test can I perform and convince the reviewer to accept my data?
    and would it be possible to use Real Statistics Using Excel software?
    I would really appreciate it if you can suggest something for this
    Please reply,
    Kind regards,
    Ravi

    Reply
      • Dear Mr. Charles Zaiontz,
        Thank you very much for your quick response.
        However, I still have some difficulties and I am not getting the result which reviewers asked for.
        I sending you an excel sheet and explaining my problem. I hope it is okay and you can help me.
        Thank you,
        With regards,
        Ravi

        Reply
  15. Hi Charles! First of all thank you for this amazing tool. I am trying to run a Binary logistic regression and this seems to be the most popular tool to do so, but when I open my real stats dialogue box it doesn’t appear as an option. Is there any reason that may be? Thank you!

    Reply
    • Hi Amanda,
      It is the Logistic and Probit Regression option on the Reg tab.
      I want to make you aware of a bug that is being fixed. I suggest you wait until Rel 7.9, which should be available tomorrow, that will fix this bug.
      Charles

      Reply
  16. Dear Mr Zaiontz,
    thank you very much for the helpful and instructive resources you provide.
    I have a few question which I could not find an answer to regarding the correction of p-values for multiple testing after ANOVA followed by a PostHoc test.
    Suppose I performed ANOVA on a dataset of 1000 parameters analyzed in three groups.
    And I receive 1000 p-values indicating if any of these parameters show significant variance between any of my three groups.
    Do I correct these for multiple testing at this point and use only the data which tested significant after correction for my chosen PostHoc test?
    If this is true–>
    Suppose I have 100 parameters with q-values below my threshold and I then test these data by multiple comparison of my three groups by way of three t-tests.
    Do I use the complete list of 3 x 100 p-values to do a single correction for multiple testing after this or are the results of separate t-tests corrected individually.
    Is the correction for multiple testing necessary/recommended after both tests?
    Thank you and best regards, Christian

    Reply
  17. Dear Mr. Charles Zaiontz,

    In the article ‘WLS regression and heteroskedasticity’ the first sentence:
    ‘Suppose the variances of the residuals of a OLS regression are known, i.e. var(σi2) = εi. ‘
    It seems to me it should look like this:
    ‘Suppose the variances of the residuals of a OLS regression are known, i.e. var(εi) = σi2. ‘
    (given that the residuals are expressed by εi).
    Just the opposite, isn’t it?

    Thanks,
    EMC

    Reply
    • Hello Eduard,
      Yes, you are correct. I have now made the change on the webpage.
      Thank you for identifying this mistake. I appreciate your help in improving the accuracy and useability of the website.
      Charles

      Reply
  18. Hello Charles,

    I downloaded the tool pack add in correctly and was able to use it once. Now I can’t access at all. I went back through the original steps to access the Add in and it’s still showing. I unchecked and then checked the X-Stats box again but I still cannot access the tool. I need to take my final exam for Stats 2 in a few days so desperately need this.

    Can you help?

    Reply
    • Stephanie,
      You should still be able to use the software unless something has changed. However, see the following the Troubleshooting section on the following webpage: https://www.real-statistics.com/free-download/real-statistics-resource-pack/
      If none of this works, I suggest that you do the following:
      1. Delete the file that contains the add-in that you downloaded previously.
      2. Open Excel and then press Alt-TI and uncheck the XRealstats.xlam entry
      3. Close Excel and then open Excel. Press Alt-TI and check the XRealstats.xlam entry. You should receive an error message suggesting that you remove this entry from the list of add-ins. Agree to do this.
      4. Download a new copy of the add-in and install it as described on the webpage. Don’t try to open the file containing the add-in. To be absolutely sure that you won’t have a problem, before installing the add-in, rename it to ZRealstats-xlam (so that Excel won’t be confused with the previous version).
      If you rename the add-in file, you can skip step #3.
      Charles

      Reply
  19. Hello,
    I have tested some of your functions for truncated normal distributions. I used the following parameters: μ = 11.1, σ =4.9, a=0, b=20. TNORM_DIST is run with TRUE, producing cumulative distribution as expected. TNORM_DIST also produces a gaussian like shape when plotted as expected.

    However, I do not think the TNORM_INV is working correctly. In the table below I apply this function to the p values, the second column. I expected it to reproduce the x values… but as you can see it produced some other numbers. Can you please check what is going on.

    x TNORM_DIST TNORM_DIST TNORM_INV
    (TRUE) – p (FALSE) x
    0 0.000000 0.006562 -2.265306
    1 0.008278 0.010204 -2.061224
    2 0.020867 0.015220 -1.857143
    3 0.039233 0.021776 -1.653061
    4 0.064938 0.029884 -1.448980
    5 0.099453 0.039339 -1.244898
    6 0.143912 0.049672 -1.040816
    7 0.198852 0.060162 -0.836735
    8 0.263984 0.069894 -0.632653
    9 0.338059 0.077887 -0.428571
    10 0.418880 0.083254 -0.224490
    11 0.503477 0.085361 -0.020408
    12 0.588425 0.083951 0.183673
    13 0.670259 0.079196 0.387755
    14 0.745888 0.071662 0.591837
    15 0.812939 0.062200 0.795918
    16 0.869971 0.051785 1.000000
    17 0.916506 0.041355 1.204082
    18 0.952935 0.031678 1.408163
    19 0.980291 0.023276 1.612245
    20 1.000000 0.016405 1.816327

    Reply
    • Hello Robert,
      Thank you for bringing this problem to my attention. There is an error in the TNORM_INV function. I have made the correction on the website and flagged that there is a problem. I will repair the error in the software shortly. In the meantime, you can use the following formula in place of TNORM_INV(p,m,s): =NORM.INV(p*(NORM.DIST(b,m,s)-NORM.DIST(a,m,s))+NORM.DIST(a,m,s),m,s)
      I appreciate your help in improving the Real Statistics website and software.
      Charles

      Reply
  20. Hi Charles,

    Thank you very much for creating and sharing such a great addon. Could I have some advice on the addon? I seem to have encountered an issue with an older version of the Tukey-Kramer test (2003 Excel version). After entering the data, there is a column labeled ‘c’ that is greyed out which is required for the calculations to proceed. Image in link (https://photos.app.goo.gl/CWNYGxXzi7V19d8Y7). What is the ‘c’ value and how do I calculate it?

    Reply
    • Hi James,
      The c column contains the contrasts. For the Tukey-Kramer test, you need to place the number +1 in this column for one of the groups and -1 for another one of the groups. This will result in a comparison between these two groups.
      You can do the same for other group comparisons.
      Charles

      Reply
  21. Hi Charles,

    I hope you and your family are well and safe. I have sent you an email with a spreadsheet showing what seems to be incorrect results when using the ROOTS function to calculate the roots of a cubic function, whereas the CUBICROOTS function seems to give different (and apparently correct) results. Please let me know if there is anything wrong on my end or if there is indeed an issue with the ROOTS function. Thank you so much for your help!

    Reply
    • Hello Alan,
      Thanks for bringing this issue to my attention.
      I confirm that ROOTS doesn’t yield the correct answer. I don’t know yet whether this is caused by poor initial values for r and s. I will be checking into it further. I may need to implement a different technique.
      Charles

      Reply
      • Hi Charles,

        Thank you for the prompt reply. I found out that changing the number of iterations to between 100 and 200 (instead of the default 50) seems to yield the right result. Please let me know if you will make any changes. In the meantime I have changed the formulas to use the CubicRoots, which does not appear to have any issues.

        Best,
        Alan

        Reply
        • Hi Alan,
          That is good to hear. I feared that I would need to change the s and r values. Changing the default for the number of iterations is easier. I will look into this for the next release. If it doesn’t impact easier cases, then I will change the default to 200.
          Charles

          Reply
  22. Hello!

    I have downloaded the real statistics file per instructions several times but it does not show up when I open a blank Excel document and try to follow the directions to install in the proper folder. I tried all 3 of the suggested troubleshoot tips and also rebooted my laptop but it doesn’t want to work. I have Excel 2016. My email address is smcglothlin22@gmail.com. If you could assist, I would really appreciate it. I am a senior at Athens State University trying to pass Stats 2 and this would help me tremendously!

    Reply
    • Hello Stephanie,
      What do you see when you enter the formula =VER() in any cell?
      What do you see when you press the key sequence Ctrl-m (i.e. hold the Control key down and press the m key)?
      Charles

      Reply
  23. Dear Dr. Charles,
    I am PhD Student and I need use Lin´s Concordance Correlation Coefficient for checking validity of one device.
    I try to use LINCCC formula in excel with Real Statistic packages. But, I have a problem, when I use it only appear the correlation coefficient without upper and lower interval coefficient.
    How I can see upper and lower confidence interval with this formula?
    Thank you for your time,
    Best regards,
    Aitor.

    Reply
  24. Hi Dr Charles,

    Your resource pack has been a great help in my learning journey! Thank you for sharing with the community such a great tool!

    Recently, I started getting an error message that goes “Compile error in hidden module: Analysis” (GOF has this issue too). Would you be able to advise what I could do? I tried reinstalling the resource pack but it did not work.

    Thank you.

    Regards,
    May

    Reply
  25. Hi Charles,

    Thank you for this resource! I used it to run a single factor ANOVA + Tukey, and one of my p-values appears to be negative. I’m wondering if that’s an error, and if not, what does it mean?

    Thank you!

    Reply
    • Hi Meng Yang,
      None of the p-values should be negative. The only exception is a very small value near zero, such as -2.34E-15. This is -.00000000000000234 and is essentially zero.
      If you are getting another type of negative value, you can email me your results and I will try to figure out what has gone wrong.
      Charles

      Reply
  26. I conducted a survey and carried out one-way ANOVA. Partial eta square for the analysis is coming out to be less than .01. Is there any way to increase it?
    The sample groups for the analysis are unbalanced in terms of count. Can this also be the reason for lesser effect size? Is there a method to counter balance this?

    Reply
    • Hello Divya,
      An effect size of .01 is quite small. This should accurately reflect the results based on your data.
      Do you any reason to believe that the result is wrong? Just because you are not happy with the result doesn’t mean that it is wrong.
      How unbalanced are your groups’ sizes?
      Charles

      Reply
  27. Hi,

    I had a group of people rating the satifaction in terms of poor, average and excellent when a service is not provided and when a service is provided. I am wondering what test I can use to compare the result of the ordinal data (satifaction) between a service is not provided and a service is provided among the paired sample. Could I use McNemar Bowker Test for this?

    Reply
  28. Hello Charles
    I am having problem finding the XRealStats function on Excel. Clt+m is not working. It worked once and then stopped working. It is still available in Addins. Please provide a solution.

    Reply
  29. Dear Prof. Charles,

    Thank you very much for these resources and knowledge sharing, it has been of great usefullness for my studies.

    I´ve been trying to implement in Excel the HAC standard errors estimation proposed by Hodrick(1992) “Dividend Yields and Expected Stock Returns: Alternative Procedures for Inference and Measurement”, Review of Financial Studies, vol 5, no 3, 357-386. The method is supposelly more efficient than Newey&West and similar ones to deal with auto-correlation in overlapping periods time series, and has reached some acknowledgment in the finance field.

    However, I have not succeded so far and so decided to reach you for consulting this. Do you know this method for HAC standard errors estimation? Would you have any thoughts on how to implement it? Using some of real statitics tools or other excel resources? Would it be any eventual possibility of including this in the resource pack on future updates?

    My apologies for eventually disturbing you with this, any thought or idea of someone of your know-how would be very appreciatted.

    Best regards, Andre.

    Reply
  30. Charles,

    Suppose we were forecasting revenue in November, 2020. Classical decomposition time-series forecasts presumably would have forecasted that the Covid trend would continue. Instead, it would have made sense to use the classical method to capture seasonality and then apply the seasonal values to alternate economic scenarios.

    Question 1: Is there a relatively easy way to do this with Excel’s forecasting functions?

    Question 2: Would we be better off to calculate the seasonal factors using our own formulas and then apply the seasonal variations to each of the alternate scenarios?

    Question 3: Would it have made sense to use historical performance in past Januaries to forecast January sales under alternate scenarios, and then do the same for the other months of the year?

    Thanks!

    Charley

    Reply
  31. Hi Charles,

    Thank you so much for kindly providing all this useful tools. I have a quick question as I was recently testing out the spline function from the excel add-in, and compared it against the sci-py spline function (cubic spline). Despite using the same inputs, the outputs differed a fair bit. May I check if you happen to have some understanding as to why that might be the case? Is the logic behind the add-in different compared to the one provided in the python library from scipy?

    Thank you in advance!

    Reply
    • Joel,
      I am not familiar with the version in the python library. If the python version is also a cubic spline function, then it is surprising to me that the results would be so dissimilar. I suggest that you create a plot/chart to see which seems to fit the data better and more smoothly.
      Charles

      Reply
      • Hi Charles,

        Thank you so much for your prompt reply. I was actually able to reconcile the difference. For the benefit of other readers who might hit this issue, turns out that it is because the default setting for the python function is such that the first and second segment at a curve end are the same polynomial. Just gotta change the argument in the CubicSpline method to ‘natural’ so that the second derivative at curve ends are zero.

        Reply
  32. Charles, I’m doing an independent-samples t-test and am getting different output when using stacked format than when formatting data the usual way. I have the ages for 1785 participants in a treatment program (531 females & 1254 males). When I use the regular format for input data, Real-Statistics uses all observations for each sex. When I use stacked format for data, however, there are problems. The output’s reformatted data shows females in the left column (P4:P534) & males in the right column (Q4:Q1257), plus the label for each sex in row 3. The Summary table correctly counts 531 females, with the formula bar shows =COUNT(P4:P1257). However, the summary table incorrectly shows that there are also 531 males, with the formula bar showing =COUNT(Q4:Q534). It looks to me like Real-Statistics is switching the number of rows it should count for males & females. I looked in the sample workbook and the independent-samples t-test example using stacked format didn’t appear to have the same problem there. Thank you for your assistance.

    Reply
    • You can ignore my previous message – I tried to do the t-test both ways on the same spreadsheet so I could email the file to you if you requested it, & excel crashed. When I re-opened it & re-ran the t-test both ways, the stacked format worked just fine. I have no idea what happened before, but it appears to be ok now.

      Reply
  33. Hello sir

    I want to use your statistical tools to arrive at best possible method for my study by comparing different techniques of forecast in a supply chain scenario. I have read that MASE is a measure of forecasting accuracy. How to convert MAE into MASE. Can u explain with example ? Also is MASE applicable to arima??

    Reply
  34. Hi Charles,
    I am not sure what I am experiencing is normal or something is not correct.

    My operating environment is Windows 10 + Office 365.
    I recently added Real Statistics following the procedure mentioned here.
    I was able to use the Real Stat add-in then.

    But then when I exit Excel and restart, the add-in is not active.
    It looks the work around is to uncheck the add-in, and then check-in again.
    It likely works.

    Why the add-ins once added, do not show up when Excel is restarted?
    Is it normal? Or anything else needed to fix this issue?

    Regards, Saradhi.

    Reply
  35. Very helpful site. Helped me to calculate Newey West standard errors on Excel for analyzing the term structure of bond yields.

    Reply
  36. Hi Charles

    Suppose you have a dependent variable T and set of 10 independent variables A, B, C… (all continuous variables). Trying to answer the question “¿Which of the regressors (A, B, C or D) have significant effects on T?” I build the following 10 models by linear regression:
    Model I: T = a0 + a1*A
    Model II: T = b0 + b1*B
    Model III: T = c0 + c1*C

    After that, I check homoscedasticity and normal distribution of errors (Breusch-Pagan and Shapiro-Wilk tests, doing that with your great Resource Pack!) and apply Ramsey’s RESET test for model specification errors (adding squares and/or cubes of T-hat). Finally, I score models taking into account the p-values of estimated coefficients for a1, b1, c1,… under Benjamini-Hochberg approach (with FDR = 0.05) .
    a) Is the Benjamini-Hochberg approach valid under that circumstances?
    b) If the goal of modeling is just the identification of relevant factors (I’m not looking for models with high predicting power), is the above procedure valid?
    Any help with these questions will be greatly appreciated
    Thanks
    Gabriel

    Reply
  37. Hi Mr. Zaiontz
    Goodnight from me from other side of equator

    Upon stumble into your website, as dive down to your website all reader can see your hard-work. All those time, effort, passion, and consistency poured for a result. An Amazing sanctuary for seeker of numbers to learn from a true master and growth as a better data analyst. So, Mr. Zaiontz me as a new born calf not afraid tiger (newbie) that set sail on stocks wave to seek a dime in number which statistical function is for my need to learn first?

    Thanks in advance Mr. Zaiontz, lets hope and pray me not sink too soon in market.

    Reply
      • Me want to calculating risks if buy a stock with previous market data’s and predict best uptrend and downtrend. So i can make optimum yield estimation from several stocks invested.

        Thank you for response.

        Reply
        • This is a very broad area that people have written books about. I can’t possibly address this issue in a few sentences. THis is also not my area of expertise.
          Charles

          Reply
  38. Hi Charles,
    Thank you for your all of this really useful information which I am using for my Data Analyst apprenticeship at the moment.
    I have been using the Logistical Regression calculation in the MS Excel add on but the results do not reflect any of the examples I have seen. My classification table has zeros in the top two Suc-Pred and in Fail-Pred i have 5666 and 13767. I have tried different variables and still get zeros returning so I am a little bit lost as to what I am doing wrong. I would be really grateful if you are able to share some insight into this for me.
    Many Thanks

    Reply
  39. Hi Charles
    I’m modeling the behavior of a system by multiple linear regression where I suppose Y = b0 + b1.A + b2.T + b3.(T-squared), with Y, A and T as continuous variables. The regression using untransformed variables give high VIF values associated with T and T-squared. As a mean of reduce the collinearity, I try variable standardization on A and T, and repeat the regression assuming Y = b’0 + b’1.A’ + b’2.T’ + b’3.(T’-squared) [where ‘ indicates “standardized variable”]. As a consequence of standardization, the collinearity practically vanishes. Following your post about standardized coefficients in linear regressions (excellent post!) I try to write the corresponding regression equation based on unstandardized coefficients, but i can’t find the way to do that on equations involving quadratic terms like in Y = b’0 + b’1.A’ + b’2.T’ + b’3.(T’-squared). How to do that?
    Thanks!
    Gabriel

    Reply
  40. Hi Charles,
    Thanks for your response to my previous question. I have another question regarding binned data. I have 26 data points, 1 @ 20.1, 1 @ 20.2, 3 @ 20.3, 14 @ 20.4 and 7 at 20.5
    I tried to fit a distribution to the data but the p values were <.05. Can I use a Goodness of Fit statistic instead to select a distribution to the data?
    Thanks again
    Stan

    Reply
  41. Dear Charles
    I have been given the means and standard deviations from a stability study. I don’t have the original data but I do know that the data was normal. Is it possible to extrapolate the mean and standard deviation for a future time point?
    Thanks
    Stan

    Reply
  42. Estimado Charles,
    Estoy fascinado por su gran trabajo y labor para contribuir a la enseñanza de estos métodos de análisis, mi total admiración y respeto.
    Soy egresado de la carrera de Ingeniería Civil y mi tesis se basa en un análisis de varianzas y regresión lineal múltiple. Consultando pude llegar a la conclusión que es un análisis de varianza factorial, pero leyendo y consultando más sobre su trabajo me ha quedado la duda si es factorial o multifactorial, le explico:
    Tengo como variable independiente una característica muy importante para el suelo que es “módulo de elasticidad”, este se pudo obtener a través de ensayos, para ser exacto, ensayo triaxial para suelos, y como variables independientes tengo 11 características geo-mecánicas del suelo, por ejemplo: ángulo de fricción, cohesión, esfuerzo cortante, etc, que de igual manera fueron obtenidos a través de varios ensayos. Lo que pretende realizar a través de mi investigación es obtener variables independientes estadísticamente representativas a través del ANOVA para de esta manera usar las que no fueron descartadas en una regresión lineal múltiple, y así conseguir una ecuación de correlación que me permita obtener la variable dependiente (módulo de elasticidad) con las variables independientes estadísticamente representativas.
    El procedimiento ya lo hice con la ayuda de RStudio y obtuve buenos resultados, pero ahora estoy intentando hacer lo mismo pero en Excel para entender más sobre como es que se realizó el análisis de varianza, cómo se obtuvo variables representativas y cómo se descartaron otras.
    Si tuviera un tiempo para responder mi mensaje estaría eternamente agradecido ya que es muy importante para mi defensa de tesis.
    O a su vez, si me podría recomendar bibliografía para continuar con mi investigación dónde pueda entender los tipos de ANOVA que hay, también me ayudaría mucho.
    Saludos cordiales, Bryan Logacho Tapia
    Quito, Ecuador

    Reply
  43. Estimado Charles,
    Quiero saber si el Análisis de Componentes Principales (ACP) está basado en la Matriz de Correlación o en la Matriz de Covarianza?

    Gracias.

    Reply

Leave a Comment