Distribution Fitting

Given a collection of data that we believe fits a particular distribution, we would like to estimate the parameters that best fit the data. We focus on three such methods: Method of Moments, Maximum Likelihood Method, and Regression.

References

Wikipedia (2017) Maximum likelihood estimation
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

Wikipedia (2021) Method of moments (statistics)
https://en.wikipedia.org/wiki/Method_of_moments_(statistics)

Hastings, N., Peacock, B. (2011) Statistical distributions. 4th Ed, Wiley
https://www.wiley.com/en-us/Statistical+Distributions%2C+4th+Edition-p-9780470390634

20 thoughts on “Distribution Fitting”

CR

January 22, 2022 at 7:14 pm

Hi Charles,
How to know which distribution will fit my data? Is it through goodness of fit tests? I test the goodness of fit of different distributions and I compare the p-value?
Reply
- Charles
  
  January 23, 2022 at 9:42 am
  
  Yes, this is the usual approach.
  Charles
  Reply
k

July 16, 2021 at 10:27 am

Hi Charles,
How to generate a set of random numbers to follow pareto (type 1) distribution using a giving mean/sd?

Thanks.
K
Reply
- Charles
  
  July 16, 2021 at 11:13 pm
  
  The Pareto distribution has two parameters: a scale parameter m and a shape parameter alpha. The inverse function for the Pareto distribution is I(p) = m/(1-p)^(1/alpha). If you know the values of mn and alpha then a random value from the distribution can be calculated by the Excel formula = m/(1-RAND())^(1/alpha).
  Now if the mean and the standard deviation sd are known, then these can be used to calculate the m and alpha parameters by solving the equations:
  mean = m*alpha/(alpha-1)
  sd^2 = m^2*alpha/((alpha-1)^2*(alpha-2))
  Charles
  Reply
Jukka

March 13, 2021 at 8:33 pm

Hi,

Thank you for creating this great tool for Excel.

I have a question regarding distribution fitting. This tool estimates the parameters for different distributions. Is it possible to compare which distribution is best fit for the data (Anderson-Darling statistic etc)?

Best regards,
Jukka
Reply
- Charles
  
  March 14, 2021 at 11:34 am
  
  Jukka,
  A commonly used approach is to choose the distribution with the smallest Akaike information criterion (AIC) value. AIC = 2k – 2LL where LL = the log-likelihood (LL) and k = the number of parameters being estimated. Essentially this means that you are choosing the distribution with the largest LL value with a penalty for extra parameters. See
  https://www.spcforexcel.com/knowledge/basic-statistics/deciding-which-distribution-fits-your-data-best
  Another commonly used approach is the Bayesian Information Criterion (BIC), namely BIC = k*LN(n) – 2LL where n = the number of elements in the sample and LN is the natural log. BIC is used in the same way as the AIC except that the penalty for additional parameters is calculated slightly differently.
  Charles
  Reply
Leonardo Alexandre

July 2, 2020 at 2:12 pm

Hi Charles,

I wanted to ask if you have a continuous variable which fits a certain distribution, how can or should you bin (categorize/discretize i don’t know the correct term) according to that distribution?

I would like to tell you that the website is very well organized and filled with usefull information.

Best regards,
Leonardo
Reply
- Charles
  
  July 2, 2020 at 7:14 pm
  
  Hello Leonardo,
  Glad that you like the website.
  How you bin the distribution depends on what you plan to use these bins for.
  Charles
  Reply
  - Leonardo Alexandre
    
    July 4, 2020 at 3:39 am
    
    Hi Charles,
    Do you have any literature you can recomend on this topic? i would like to extract the maximum information from the data.
    Best regards,
    Leonardo
    Reply
    - Charles
      
      July 4, 2020 at 11:51 pm
      
      Hello Leonardo,
      This is a very big topic. A number of the references in the Bibliography might be helpful. See
      Bibliography
      Charles
      Reply
Jessica

January 15, 2020 at 6:39 am

Hi Charles,

If I want to estimate the distribution of the bus inter-arrival time, what is the best distribution to fit my data? Appreciate your advice on this. Thanks.

Regards,
Jessica
Reply
- Charles
  
  January 15, 2020 at 8:31 am
  
  Jessica,
  Usually, the exponential distribution is used for this purpose. See
  https://real-statistics.com/other-key-distributions/exponential-distribution/
  Charles
  Reply
Agah Orhan Dikmen

September 2, 2019 at 1:44 pm

Dear Charles,
Does GAMMA_FIT function supports curly parentheses, for instance {GAMMA_FIT(IF(R100:R138>0,R100:R138),,100)} does not create a result.
Thanks in advance,
Reply
- Charles
  
  September 3, 2019 at 9:23 am
  
  GAMMA_FIT is an array function. See the following for how to use array functions.
  Array formulas and functionsArray Formulas and FunctionsCharles
  Reply
Wayne

March 20, 2018 at 9:46 pm

Hi Charles,

I recently came across your website and found it very very useful. Thank you so much for your great efforts!

I wanted to ask whether it would be possible to do distribution fitting via MLE (by using Real Statistics functions) for a Gumbel distribution?

Thank you so much.

With best regards,
Wayne
Reply
- Charles
  
  March 20, 2018 at 10:20 pm
  
  Wayne,
  I am pleased that you are getting value from the website.
  The Real Statistics software doesn’t yet support the Gumbel distribution.
  Charles
  Reply
  - Wayne
    
    March 21, 2018 at 12:02 am
    
    Hi Charles,
    
    Thanks for your reply. If I use Excel’s Solver to fit a Gumbel distribution, i.e. the approach taken for fitting a Weibull distribution, as described in https://real-statistics.com/distribution-fitting/distribution-fitting-via-maximum-likelihood/fitting-weibull-parameters-mle/, then how to initialize the location parameter and scale parameter of the Gumbel distribution?
    
    Your guidance is greatly appreciated.
    
    Best regards,
    Wayne
    Reply
    - Charles
      
      March 21, 2018 at 8:46 am
      
      Wayne,
      The approach using Solver will probably work. I will eventually support the Gumbel distribution, but at present I haven’t researched what are good initialization values.
      Charles
      Reply
Leandro Dutra

February 16, 2018 at 7:39 pm

Hi Charles,

First, I loved you site! Helped me a lot.

Second, how can you use the MLE with Newton Method and censored data to fit a three parameter weibull distribution?

Best regards,
Leandro Dutra.
Reply
- Charles
  
  February 17, 2018 at 8:45 am
  
  Leandro,
  Glad the website has been helpful to you.
  The Real Statistics website and software covers MLE with Newton Method and censored data to fit a two parameter Weibull distribution. See
  https://real-statistics.com/distribution-fitting/distribution-fitting-via-maximum-likelihood/weibull-censored-data/
  The three parameter version is not supported. Of course, if you fix a value for the third parameter, you can use the two parameter version.
  Charles
  Reply

References

20 thoughts on “Distribution Fitting”

Leave a Comment Cancel reply