Author

Charles ZaiontzDr. Charles Zaiontz has a PhD in mathematics from Purdue University and has taught as an Assistant Professor at the University of South Florida as well as at Cattolica University (Milan and Piacenza) and St. Xavier College (Milan).

Most recently he was Chief Operating Officer and Head of Research at CREATE-NET, a telecommunications research institute in Trento, Italy. He also worked for many years at Bolt Beranek and Newman (BBN), one of the most prestigious research institutes in the US, and is widely credited with implementing the Arpanet and playing a leading role in creating the Internet.

Dr. Zaiontz has held a number of executive management and sales management positions, including President, Genuity Europe, responsible for the European operation of one of the largest global Internet providers and a spinoff from Verizon, with operations in 10 European countries and 1,000 employees.

He grew up in New York City and has lived in Indiana, Florida, Oregon, and finally Boston, before moving to Europe 36 years ago where he has lived in London, England and in northern Italy.

He is married to Prof. Caterina Zaiontz, a clinical psychologist and pet therapist who is an Italian national. In fact, it was his wife who was the inspiration for this website on statistics. A few years ago she was working on a research project and used SPSS to perform the statistical analysis. Dr. Zaiontz decided that he could perform the same analyses using Excel. To accomplish this, however, required that he had to create a number of Excel programs using VBA, which eventually became the Real Statistics Resource Pack that is used in this website.

487 thoughts on “Author”

  1. Dear Charles,

    Let me first thank you for your great website and the very useful software, which I have long appreciated very much.

    Now I have been experiencing a difficulty in conducting a logistic regression analysis with the Real Statistic software, which is possibly a bug in the program.

    When I choose an appropriate input range with summary data and click the “OK” button, a message saying “Input rage must have at least as many data rows as columns” appears.

    This is understandable if the data is raw data. However, if it is summary data, columns can surely exceed the rows, for example when your model is interaction model and contains many product (interaction) terms.

    In particular, my model was, using Real Statistic Function,
    = LogitSelect (R1, “1, 2, 3, 4, 5, 6, 7, 1*7, 2*7, 3*7, 4*7, 5*7, 6*7”, True)
    Is there something wrong with this interaction model?

    I think this message should appear only when the data was raw data, and should not appear when it was summary data.
    But there may be misunderstandings on my part.

    In any case, I would appreciate your kind advice.

    Best regards,
    Masa

    Reply
    • Hello Masa,
      I have just issued a new release of the Real Statistics software, Rel 7.3.1, that eliminates the error message. You should now be able to use the logistic regression data analysis tool. I am not sure whether the model will converge to a solution, so I would appreciate your letting me know whether you did get a solution.
      Charles

      Reply
  2. Hello Dr. Zaiontz,

    I followed your Mann-Kendall Test instruction page (https://www.real-statistics.com/time-series-analysis/time-series-miscellaneous/mann-kendall-test/) but I am having trouble figuring out the equation you used to calculate the ties corrections. Do you have any references I can look up? I have looked many places but have not found how to calculate the ties corrections. I am curious to know if the correction equation you used is specific to the data set and how I apply it to my own data set.

    Thank you,

    Reply
      • Hi Sir,

        While searching “cox ph approach in excel” in the internet, I came across your example which computes the survival probabilities using cox ph partial likelihood method using excel. I have found it intuitive and really very helpful. I also validated it using R code (breslow approach under surv) and it matched.

        I have a similar dataset, with one categorical causal variable only, “Product”. I created a dummy variable which takes 1 when product is of a particular type and 0 otherwise. Then I tried to use the similar approach in excel. However I didn’t get correct match while checking the result in R.

        I have shared the dataset with you over email. Please could you advise me how could I handle this example.

        Thank you

        Reply
  3. Instead of clicking CTL+m and putting the information of my data rage in the pop-up window for the time series analyses, I put “SEN_SLOPE(my data range)”in an excel cell below of my data array and “MK_TEST(my data range) in another cell. I have 9,000 time series data and would like to get the Sen’s slope value and its significance(p-value) by dragging these two function columns. This method gave certain values but they seem to not be the correct Sen’s slope values or their p-values either. How can I get correct trend and its significance values for multiple cases in the excel using your software?

    Reply
      • I email you with my data file referring to your contact info. in this web site. Once again, i would like to calculate Sen’s slope and its p-value from MK Test for more than 9000 cases of time series data. Thus, if available, I want to drag the one excel cell with the function (Sen_Slope & MK_Test) from your software to apply for all. I will wait for your quick reply to my email.

        Many thanks
        Gwangyong Choi

        Reply
  4. Gentile professor Zaiontz, chi le scrive non è in grado di capire una sola parola si statistica.
    Succede però che grazie alla sua pagina web, al suo pacchetto software e al suo splendido lavoro, sto realizzando in maniera autonoma l’analisi dei dati della mia tesi di master. Le assicuro che per me è una grande conquista e per averla resa possibile la ringrazio infinitamente.
    Tempo fa ho trovato la pagina di Real Statistics per caso e ho realizzato alcune analisi per comprendere come funziona il software. Oggi ho ripreso in mano la tesi dopo quattro mesi di inattivitá e con gran sorpresa ho letto la sua bibliografía che prima avevo ignorato. Sono anch’io di Trento, per ragioni familiari vivo all’altro capo del mondo e mi fa grande piacere scoprire che ho scelto il suo lavoro per compiere il mio.
    In questo messaggio non troverà argomenti di statistica ma semplicemente questo piccolo ringraziamento e i miei complimenti.

    Mauro Brunelli

    Reply
    • Ciao Mauro,
      I am very pleased with your comment. I am very happy that I was able to help you.
      Mi fa molto piacere il tuo commento. Sono molto felice di essere stato in grado di aiutarti.
      Charles

      Reply
  5. Dear Charles,
    I found a little mistake in Figure 2 – REGWQ test. As there is no Response section, I didn’t know where to put it. α(p) is not adjusted for the second stage, meaning V8 and W8. They should be 0.040204.
    Jürgen

    Reply
  6. Hello, have a great day! I just want to ask about forecasting method. What could be the best method to use in the research paper if you have gathered annual data? Is the Holt’s Winters Method not applicable to it? Why? Hoping for your response. Thank you! 😇

    Reply
  7. Dear Charles,
    Firstly, thank you for your posts – they provide useful insight to statistics for an amateur such as myself.
    I have a question of what test would be best used for my problem:
    I have 46 patients and have identified different patient factors (e.g. age, gender, bone density, tissue density, etc) and each patient is administer ultrasound at increasing powers repeatedly with tissue temperatures recorded with each application of ultrasound until the therapeutic effect is achieved.
    Unfortunately the ultrasound powers are not exactly the same for each patient, and some patients require more episodes of treatment to achieve the therapeutic effect.
    Is there a way to assess which of the patient factors (e.g. age, gender, tissue density) has an effect on the power required to reach the resulting temperature on each application of ultrasound?
    I thought a repeated measures ANOVA, but seek your advice to be confident I’m on the right track.
    Regards,
    David

    Reply
    • Hello David,
      If I understand correctly, age, gender, bone density, tissue density, etc. are the independent variables that you are interested in. The power required to reach the desired temperature appears to be the dependent variable. This looks like an application of regression. If you are also interested in the number of treatments required then you can use Poisson regression for this.
      Charles

      Reply
  8. Just discovered this amazing tool…it’s just awesome what you have created here and all for free….it is soooo helpful!

    Reply
  9. Hi Charles:
    Your resource is great, but I am not sure how to carry out an equivalence test.
    We are testing whether two dental procedures are equivalent (implants).

    Thanks,

    Jaime Núñez

    Reply
  10. The tolerance calculations were very helpful. How would you perform the calculation if your data isn’t normally distributed.

    Reply
  11. Hi Charles,

    I found your tutorial on how to apply cubic splines using Excel very useful, as it is advantageous to use Excel versus something like MatLab to perform these operations, especially due to accessibility and price.

    https://www.real-statistics.com/other-mathematical-topics/spline-fitting-interpolation/

    Would you happen to be able to publish an addendum to this tutorial that covers examples and applications of the smoothing cubic spline function that utilizes a weighting parameter? I believe Ridge regression is commonly used as an analogy here.

    Thank you!

    Reply
  12. Dear Mr Zaiontz,

    many thanks for providing such a great statistical tool! It is a pleasure working with it!

    I am trying to run a weighted linear regresssion (the explanatory variables are the dow jones returns (and 2 dummy variables, in oder to capture pre- and post-event returns and the dependent variable are the sugar returns). The weights have been assigned by using the reciprocal of the conditional variances that i have estimated using a GARCH(1,1).

    Unfortunately, I constantly get the error message “division by 0” when I am trying to run the regression.

    Could you please give an advice on what is possibly going wrong?

    Thank you very much in advance! Hope to hear from you soon!

    Kind regards,
    Julia

    Reply
  13. Hi Charles,
    I have data for trials conducted to evaluate five potato varieties across three sites over two seasons and decided to do pool analysis. I had done the homogeneity test for seasons and had no significant differences and I decided do pooled analysis. Is it right for me to do the pool analysis for the two factors (variety and site) if there is no significant differences in two seasons?
    You assistance is very much needed.
    Can you send me your email address? My email address is jonahanton986@gmail.com
    Regards,
    Jonah

    Reply
  14. Thanks for your informative explanations

    I have a question

    Which statistical test should I use when the independence assumption is violated?

    Many thanks

    Reply
    • It depends on what hypothesis you are trying to test, but generally it is difficult to conduct a valid test if the independence sample assumple is violated.
      Charles

      Reply
      • My objective is to evaluate the significance of differences in robustness measure, which require a statistical test.

        The robustness measure used as follows.

        I am working with different linear regression models and many datasets.

        First, I standardised all the variables (independent/dependent) to zero mean and unit variance.

        Suppose I am working with the linear regression model. Then I performed 30-fold split for the dataset. So, I have coefficients for each fold. I calculated the variance for each variable within 30 models. Finally, I sum all the variances.

        For example, I have 30 coefficients for a variable (X1), then I calculate the variance for 30 coefficients and the same for all the remaining variables and Finally, I sum all the variance in one total value.

        I did this process with different models and datasets. So I end up with a matrix contains the Sum of variances values (its rows refer to linear models used and its columns for datasets used).

        I need to use a statistical test to evaluate the significance of differences in robustness (sum of variance value).

        Any suggested statistical test?

        Your guidance is really appreciated!

        Reply
  15. Dr. Z,

    Thank you so much for all your work in creating the valuable resource that is this website.

    I am trying to convert 3 data points, namely the mode, 5th percentile and 95th percentile, into a Beta distribution. What is the most efficient way in Excel to obtain the Alpha and Beta from those 3 data points? Can it be done without an iterative process?

    If you prefer, this question can be moved to one of the pages dealing with Beta distributions.

    Many thanks,

    DB

    Reply
    • I suggest that you use Solver as follows:
      1. Insert the values for the mode, 5th percentile and 95th percentile in cells A1, A2 and A3.
      2. Insert the initial guesses (say 2 and 2) in cells A4 and A5
      3. Insert the formulas for the mode, 5th percentile and 95th percentile based on the alpha and beta values in cells B1, B2 and B3. Namely, insert the following formulas in these cells: =(A4-1)/(A4+A5-2), =BETA.DIST(0.05,A4,A5,TRUE) and =BETA.DIST(0.95,A4,A5,TRUE)
      4. Insert an error measurement in cell A6, namely the formula =SUMXMY2(A1:A3,B1:B3). This is the sum of the squared errors, the value we want to minimize.
      5. Now select Solver from the Data ribbon. In the dialog box that appears, insert A6 in the Set Objective field, choose Min and insert the range A4:A5 in the By Changing Variable Cells field. After clicking on the Solve button, estimates for alpha and beta in cells A4 and A5 should be obtained.
      Note: The formula in cell B1 for the mode is only applicable when alpha and beta are larger than one. The necessary modifications are not that difficult. Things are easier if you use the mean instead of the mode since the formula in cell B1 becomes =A4/(A4+A5) in all cases.
      Charles

      Reply
      • Thank you so much. I’ve used Solver for regressions before but never knew about the SUMXMY2 function which does away with helper columns.

        I’m having some difficulty with the solution, and I think the issue lies in the difference between my raw data and the 0 to 1 scale. Maybe we need to solve for [A] and [B] as well?

        What is the correct solution where the raw data is as follows:
        mode = 1.00
        5th percentile = 0.96
        95th percentile = 1.08

        Thanks again!

        Reply
  16. my questions is if i am doing a forecast for daily data and i have actual data for 4 previous years lets say 2019,2018,2017,2016
    what is the year that i can start get forecast values for so that i can evaluate the model withe the error measurements ?
    and thanks

    Reply
    • I don’t know of a definite answer to your question; this is a judgement call. You can base the model on years 2016, 2017 and 2018 and check its accuracy based on 2019.
      Charles

      Reply
  17. I just downloaded this add-in to Excel. I can’t thank you enough for this tool. This a phenomenal resource and you sir are the dude. The Dude abides.

    Reply
  18. Dear Dr. Charles
    i have done all the mathematical equation that is needed to forecast using SARIMA model and all things worked good for me but i need to ask you how i can calculate the mean percentage error “MAPE” for this method as it gives me the forecast for the next period that i don’t know the real “actual” data for it , but to calculate the MAPE to compare these method to other methods i need the forecast for these periods.
    Can you help me please ?

    Reply
      • thank you
        but i already know the equations to calculate the MAPE
        but the proplem is with the error
        there is no forecast data for the period that have actual data
        it just give me a forecast for the net periods that don’t have actual data
        SARIMA method

        Reply
        • Mohammad,
          If you don’t have actual data, you won’t be able to calculate errors. Sometimes the model is based on part of the data and then the rest of the data is used to determine the quality of the model since in this case there is some actual data left and so errors can be calculated using the forecasted values vs actual data.
          Charles

          Reply
  19. Dr. Zaiontz,

    I can’t thank you enough for the work you’ve put into this site. I can never know effort you made over the years of learning and dedication it took to become an expert in a topic many, including myself, find extremely challenging, but to do all of it and then have such passion to share what you know and guide people along their own journey with statistics that you made (and maintain!) a resource like this site shows that you truly care about what you do, and I love that! It’s infectious! Thanks for getting me through some of the toughest classes in my undergrad and for giving me a passion for stats!

    Reply
  20. I want to thank all the people behind this website for straightforwardly explaining statistics and providing easy-to-follow examples using Excel.

    Reply
      • Hello Charles, I just wanted to say thank you for the tremendous website. The amount of analysis and work you have put into this site are amazing. In 1994 when I started an ISP (with 9600 baud modems) the one thing I hoped for most for the burgeoning Internet was that people would begin to communicate and share all manner of information, and they would do it readily and freely. That we could all learn from each other. For a while that idea held promise, but unfortunately not for long.
        Your work and your website are truly examples of that original idea from so long ago. Your willingness to help are what could still form the backbone of the Internet. I was dubious and skeptical at first, but I was surprised and extremely gratified to find your site. I use your site regularly and you give me hope for the Internet. Please don’t stop doing what you are doing. Thank You Very Much, Rich Gibbons

        Reply
        • Hello Richard,
          Thank you very much for very kind remarks.
          I understand very well where you are coming from. I worked with many of the people who were involved in the Internet from very early days and they had very high hopes for this new frontier, some of which were realized and some of which unfortunately were not.
          Charles

          Reply
        • Rich,
          Though I haven’t started an ISP, I also use this site almost daily. Dr. Zaiontz’ posts, resources, and explanations helped break down the walls I had built around myself that said, “You’re not a numbers person”, “You’re just not good at math”, and “It’s too hard; just quit”, to the point where I went from not having taken a math course since 10th grade in high school to deciding to go for a M.S. in Data Analytics! I’m glad to hear others are getting as much out of it as I do, and I hope Dr. Zaiontz reads this and knows he has changed the course of my life because of his work here (and in all his other contributions, obviously!).
          – Blake

          Reply
  21. Hello Charles,
    Thanks for creating and posting all of this information! It was very useful and I’ve recommended the site to my students! People appreciate your work!
    Paul

    Reply
  22. Dr. Zaiontz,

    Please help. My dissertation is at a stand-still. I am scheduled to graduate in March 2020.

    I intended to use chi-square (Fishers Exact) but was unable to obtain a high enough survey response rate, which yielded a 17% margin of error/confidence interval, at 95%CL. My committee insists I either resurvey or choose a different method due to the CI being so “high”.

    I have: 9 IV, 1 DV. Total population:500. Survey sent to 119 based on (SRS) simple random sampling. 30 participants completed the survey(30 observations). Survey completion rate of 25.2%.

    Am I able to conduct multiple regression instead with what I have? Do I meet the conditions/assumptions?

    And if so, does multiple regression require I choose a CI, as well as a CL?
    Thanks!!!!

    Reply
    • Hello,
      I really don-t have enough information to be able to give any advice.
      Can you explain further what you were testing using Fisher’s exact test and what sort of results you got?
      Charles

      Reply
      • Hi. Yes.

        Testing to see if age, race, gender, experience, political affiliation, and a few other variables have a statistically significant relationship to academic union support.

        I had several expected values less than 5, so I used Fishers. 30 total observations.

        For example:
        Gender and Union Support: Fishers p value .230, fail to reject null hypothesis.

        Is this what you need?

        Reply
        • Hi,
          Thanks for the clarification. You can use regression with age, race, gender, experience, political affiliation, etc. as independent variables and academic union support as the dependent variable. If this variable just takes two values, you should explore binary logistic regression. The results of this approach would tell you which of the factors are significant in predicting union support (with p-values and confidence intervals for each).These topics are explained on the Real Statistics website.
          Charles

          Reply
  23. I can’t thank you enough for creating this set of tools. Do people really believe impoverished students can afford SPSS? You’re a life-saver. Now if I can figure out how to use the discriminant analysis tool…

    Reply
  24. Hi Charles
    I have only three column in excel with Frequency, mileage[km] and censor or failure.
    mileage[1600, 75, 3500, 5000]
    Failure or Censor[F,F,F, C]
    Frequency[1,1,1,54]

    how can I perform weibull analysis in excel? appreciate that you post it to my email

    Reply
  25. Can I use Mann-Kendall test and Sen’s slope estimator to identify long-term (40-70 years) streamflow change trends and variability? Could you refer me some useful links and references on them please.

    Reply
  26. Hi Charles.
    You’re article “Mann-Kendall Test” which is great, how could you work out the Tau values for that same data set. or do you have a article that explain in the same way, step by step on working out kandell tau values.

    Reply
  27. Dear Charles Zaiontz,

    I need to calculate the sampling variance of Cohen’s d in case of a one sample t test and I found your post “Confidence Interval for one sample Cohen’s d” (link: https://real-statistics.com/students-t-distribution/one-sample-t-test/confidence-interval-one-sample-cohens-d/). In this post you refer to Hedges and Olkin (1985). My question is, did you find the formula of the sampling variance in the book of Hedges and Olkin (1985)? If you did, on what page can I find that formula?

    Thank you in advance.

    Jasmine

    Reply
      • Dear Charles,

        thank you very much for your reply. I found some useful information in that article. There is just one question that I really like to ask:
        so to calculate the sampling variance of Cohen’s d in case of a one sample t test (where I have one group with one measurement on a variable of interest) I can use this formula (1/n)+d2/(2*n), right? Where n represents the sample size and d the Cohen’s d. However, according to Borenstein (2009) this formula can be used to calculate the variance of paired groups (with two measurements of one group). In this case, the original formula is ((1/ni)+di2/(2*ni))*2*(1-r), where r represents the correlation between the two measurements on the variable of interest.
        However, if I just want to calculate the variance of Cohen’s d in case of a one sample t test, then I should assume that the correlation r in that formula is equal to .5 (i.e., 2*(1-.5)=1), right? In this case, I get the first formula. This means that I assume that the correlation between the group and population(?) on the variable of interest is equal to .5? So my question is, is it justified to make such an assumption? And if so, is it perhaps an assumption that is too strong? And can I really use that formula to calculate the sampling variance of Cohen’s d in case of a one sample t test, or are there alternatives?

        Thank you in advance.

        Jasmine

        Reply
  28. Dear Charles,

    Thank you very much for this website and for the Real Statistics Package for Excel. It is amazing and very useful. Excel is a powerful tool, but after this add-in is now even more useful and user-friendly for non-mathematics people.
    I appreciate your work very much. Very helpful.

    Thank you.

    Laco

    Reply
  29. Dr Zaiontz:
    I wanted to see if there was an appropriate citation for N=90 and 5 independ variables for the DW stat. my DW is about 2.2 and I would like to cite a source that would support no autocorrelation or values of the residuals as independent.
    Thanks you Sir

    Reply
  30. Good day sir,
    How can I used Box plot in R-codes if my table are 3 x 3 contingency tables? Can you give me a data that are example in 3 x 3 contingency tables That are using R-codes in box plot.

    Reply
  31. Excellent way to help all of us looking for easier stats. Simple examples and an add-in flawlessly working.
    This is just to let you know I am very thankful!

    Reply
  32. Hi ,

    I discovered this product on Youtube , found it amazing and now wants a piece of it. I am trying to download it but when I click in the download button, nothing happens. I am looking to use Logistic regression that I use very often. I also do not know what package to install. Plus I have excel 2016, is tat Okay?
    Could you help, please?

    Reply
  33. hello sir,

    I hope that you are good please i want to know if you can create a simulation for me on excel for a fee of money please if you can just contact me on my e-mail

    Reply
  34. Dear Sir
    Thanks a lot for giving research scholars like me this wonderful software.

    Please help me with performing iteratively reweighted least squares regression using this software.

    Reply
  35. Thank you very much for your great effort, Dr. Charles Zaiontz.
    This website is helping me with my theses.
    I never imagine before that Excel could these statistic tests.
    It is great as a statistic tool learning for me as well as simplified my problem because SPSS program is too heavy for my old laptop.

    Regards,
    Paramita

    Reply
    • Hello Charles,
      I using real stats for Excel 2013 on Windows and would appreciate if you could help me with the following. I am performing a MANOVA on a data set that is extremely similar to the one you used in the example using four types of soil and measuring yield, water requirement, and fertilizer requirement. You have a total of 32 measurements for each of the three dependent variables (eight for each of the four types of soil). Likewise, I have three independent variables, laser (8 subject), no laser (six subject), and control (21 subjects), for a total of 35 measurements on each of three dependent variables, acuity (A), contrast sensitivity (CS), and retinal thickness (RT). I proceeded by overwriting your example data with my, which simply added three rows. I then proceeded to change the formulae in cells F4 thru L7. That works fine. However I then tried to change the formulae in the SS CP AND GROUP COVARIANCE MATRICES. I received an error message reading “you cannot change part of an array”.
      I MANOVA closer resembles your example and I would like to utilize all of the formatting you have done without completely rewriting all of the formulae. How can I do this?
      Thanks very much.
      Joel joelmweinstein@me.com
      PS I’m not very familiar with your website layout. Where will your reply be posted? Would appreciate it if you could send a copy to my email address.

      Reply
      • Joel,
        You should be able to modify parts of the output produced by Real Statistics. However, if need to change a few of the cells produced by an array formula, then you will need to be a little clever since you cant modify cells with a range output from an array formula. This an Excel restriction. See
        Array Formulas and Functions regarding the error message you are receiving.
        Suppose that the range A1:B5 contains an array formula and you want to modify the output in cell B2. One way to accomplish this is to place the formula =A1 in cell D1, highlight range D1:E5 and then press Ctrl-R and Ctrl-D. Mow range D1:E5 will contain the same results as A1:B5, but whereas you couldnt change cell B2, you can change cell E2.
        Note too you can write your own VBA formulas using calls to the Real Statistics functions, including array functions. This is explained at
        Calling Real Statistics Function in VBA
        Charles

        Reply
  36. Dr. Charles Zaiontz,
    I want to estimate the translog production function by using the method of ridge regression as my data has a multicollinearity issue. I also tried step by step on the data you have uploaded on the site, but something is going wrong as I did not find the command (i-e. DIAG) in the excel sheet. Now, I can’t go for the remaining work. Therefore, I need your rich and timely assistance in this regard.

    Thanks a million.

    Reply
  37. Hello Dr. Zaiontz,

    I stumbled upon your website and saw an example of the Hodges–Lehmann estimator. But it was calculated in the context of a wider problem, and not what I was looking for.
    Can one construct a formula in excel, dedicated to output the Hodges–Lehmann estimator for a given series/array of numbers?

    Thank you for your advice,
    Orion

    Reply
  38. hello, Dr. Zaiontz,
    Here I am looking for your help again. I collected some writing samples by 30 Chinese students. To be exact, the 30 students each wrote a dissertation in English and wrote a research article in Chinese. I intend to see if there is any difference between their English dissertations and their corresponding Chinese research articles in terms of hedges. What statistical method should I use for this end? Or is it possible to do any statistical analysis with these data? Thank you so much for your time and help! Best wishes.

    Reply
  39. It’s likely possible to have an overlap classification data at the end when using LDA. Can we set a threshold in discriminant analysis to provide more separation class data points? if so, how?. Regards.

    Reply
    • Fergo,
      The point of LDA is to determine a specific category. Since the outcome are weights, I guess you can interpret the existence of overlap categories, but I am not surre what purpose this would serve. When you say that you are seeking more separation of the data points, what do you mean?
      Charles

      Reply
      • i want to test a data vector whether it belongs to category A or B. So, I run several inputs data vector that i know it belongs to category A, but the outputs show some of them are wrongly categorized as B. I meant to seek a way to get a better outputs, maybe by applying a threshold or something so at least i can reduce the errors. Would you like to share some ideas please?

        Reply

Leave a Comment