Distribution Fitting

Given a collection of data that we believe fits a particular distribution, we would like to estimate the parameters that best fit the data. We focus on three such methods: Method of Moments, Maximum Likelihood Method, and Regression.

References

Wikipedia (2017) Maximum likelihood estimation
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

Wikipedia (2021) Method of moments (statistics)
https://en.wikipedia.org/wiki/Method_of_moments_(statistics)

Hastings, N., Peacock, B. (2011) Statistical distributions. 4th Ed, Wiley
https://www.wiley.com/en-us/Statistical+Distributions%2C+4th+Edition-p-9780470390634

24 thoughts on “Distribution Fitting”

  1. Hi Charles,

    I totally love what wrote here. Very informative for me as a Reliability Engineer.
    I’ll be studying all these info 1-by-1. But I have something to really want to come up. I am making a model to include reliability analysis in the process which will analyze the reliability level of an equipment (pump, compressor, turbine, etc.) using its MTBF or failure data for my Replacement Study. Should I be considering to add in the process how to identify the type of distribution to be use? Or does Weibull Distribution already enough to analyze my maintenance historical data?

    I hope you could enlighten me on this matter. I look forward for your reply.

    John

    Reply
    • Hi John,
      If you already know that the Weibull distribution is a good fit for the type of reliability analysis that you are planning to do, then use Weibull. Reliability analysis for Weibull is well-known and the formulas based on MTBF are clear.
      If you are not sure whether Weibull is a good fit, you can try to fit the data against other distributions and pick the distribution that provides the best fit (based on least squared error, AIC, or some other measure).
      Charles

      Reply
      • Hi Charles,

        Thanks for your wonderful advise.
        I’d like to explore possibilities with other distribution as I might be adding other input data in the future aside from the MTBF. Do you have any reference written on how to perform fit test in better and easiest way? To be honest, I am not really savvy into statistics. I hope you can refer me to one.

        Thanks and more power.
        John

        Reply
        • Hi John,
          The Real Statistics website provides support for distribution fitting for a large number of distributions.
          You can also consult one of the references on this webpage.
          Charles

          Reply
  2. Hi Charles,
    How to know which distribution will fit my data? Is it through goodness of fit tests? I test the goodness of fit of different distributions and I compare the p-value?

    Reply
  3. Hi Charles,
    How to generate a set of random numbers to follow pareto (type 1) distribution using a giving mean/sd?

    Thanks.
    K

    Reply
    • The Pareto distribution has two parameters: a scale parameter m and a shape parameter alpha. The inverse function for the Pareto distribution is I(p) = m/(1-p)^(1/alpha). If you know the values of mn and alpha then a random value from the distribution can be calculated by the Excel formula = m/(1-RAND())^(1/alpha).
      Now if the mean and the standard deviation sd are known, then these can be used to calculate the m and alpha parameters by solving the equations:
      mean = m*alpha/(alpha-1)
      sd^2 = m^2*alpha/((alpha-1)^2*(alpha-2))
      Charles

      Reply
  4. Hi,

    Thank you for creating this great tool for Excel.

    I have a question regarding distribution fitting. This tool estimates the parameters for different distributions. Is it possible to compare which distribution is best fit for the data (Anderson-Darling statistic etc)?

    Best regards,
    Jukka

    Reply
    • Jukka,
      A commonly used approach is to choose the distribution with the smallest Akaike information criterion (AIC) value. AIC = 2k – 2LL where LL = the log-likelihood (LL) and k = the number of parameters being estimated. Essentially this means that you are choosing the distribution with the largest LL value with a penalty for extra parameters. See
      https://www.spcforexcel.com/knowledge/basic-statistics/deciding-which-distribution-fits-your-data-best
      Another commonly used approach is the Bayesian Information Criterion (BIC), namely BIC = k*LN(n) – 2LL where n = the number of elements in the sample and LN is the natural log. BIC is used in the same way as the AIC except that the penalty for additional parameters is calculated slightly differently.
      Charles

      Reply
  5. Hi Charles,

    I wanted to ask if you have a continuous variable which fits a certain distribution, how can or should you bin (categorize/discretize i don’t know the correct term) according to that distribution?

    I would like to tell you that the website is very well organized and filled with usefull information.

    Best regards,
    Leonardo

    Reply
  6. Hi Charles,

    If I want to estimate the distribution of the bus inter-arrival time, what is the best distribution to fit my data? Appreciate your advice on this. Thanks.

    Regards,
    Jessica

    Reply
  7. Dear Charles,
    Does GAMMA_FIT function supports curly parentheses, for instance {GAMMA_FIT(IF(R100:R138>0,R100:R138),,100)} does not create a result.
    Thanks in advance,

    Reply
  8. Hi Charles,

    I recently came across your website and found it very very useful. Thank you so much for your great efforts!

    I wanted to ask whether it would be possible to do distribution fitting via MLE (by using Real Statistics functions) for a Gumbel distribution?

    Thank you so much.

    With best regards,
    Wayne

    Reply
  9. Hi Charles,

    First, I loved you site! Helped me a lot.

    Second, how can you use the MLE with Newton Method and censored data to fit a three parameter weibull distribution?

    Best regards,
    Leandro Dutra.

    Reply

Leave a Comment