Given a collection of data that we believe fits a particular distribution, we would like to estimate the parameters that best fit the data. We focus on three such methods: Method of Moments, Maximum Likelihood Method, and Regression.
- Method of Moments
- Exponential Distribution
- Weibull Distribution
- Beta Distribution
- Negative Binomial Distribution
- Uniform Distribution
- Triangular Distribution
- PERT Distribution
- Log-normal Distribution
- Generalized Extreme Value (GEV) Distribution
- Generalized Pareto Distribution (GPD)
- The Generalized Gamma Distribution
- Real Statistics Support
- Maximum Likelihood Method
- Exponential Distribution
- Weibull Distribution (using Solver)
- Weibull Distribution (using Newton’s Method)
- Gamma Distribution
- Beta Distribution
- Negative Binomial Distribution
- Uniform Distribution
- Triangular Distribution
- PERT Distribution
- Lognormal Distribution
- Gumbel Distribution
- Logistic Distribution
- Laplace Distribution
- Pareto Distribution
- Geometric Distribution
- Cauchy Distribution
- Generalized Extreme Value (GEV) Distribution
- Generalized Gamma Distribution
- Weibull Distribution with Censored Data
- Weibull Distribution with Multi-Censored Data
- Real Statistics Support
- RegressionMethod
- Other Distribution Fitting Approaches
- Distribution Fitting Data Analysis Tool
- Confidence Intervals for Fitted Parameters
- Kernel Density Estimation (KDE)
- Statistical Divergence
References
Wikipedia (2017)Â Maximum likelihood estimation
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
Wikipedia (2021)Â Method of moments (statistics)
https://en.wikipedia.org/wiki/Method_of_moments_(statistics)
Hastings, N., Peacock, B. (2011)Â Statistical distributions. 4th Ed, Wiley
https://www.wiley.com/en-us/Statistical+Distributions%2C+4th+Edition-p-9780470390634
Hi Charles,
I totally love what wrote here. Very informative for me as a Reliability Engineer.
I’ll be studying all these info 1-by-1. But I have something to really want to come up. I am making a model to include reliability analysis in the process which will analyze the reliability level of an equipment (pump, compressor, turbine, etc.) using its MTBF or failure data for my Replacement Study. Should I be considering to add in the process how to identify the type of distribution to be use? Or does Weibull Distribution already enough to analyze my maintenance historical data?
I hope you could enlighten me on this matter. I look forward for your reply.
John
Hi John,
If you already know that the Weibull distribution is a good fit for the type of reliability analysis that you are planning to do, then use Weibull. Reliability analysis for Weibull is well-known and the formulas based on MTBF are clear.
If you are not sure whether Weibull is a good fit, you can try to fit the data against other distributions and pick the distribution that provides the best fit (based on least squared error, AIC, or some other measure).
Charles
Hi Charles,
Thanks for your wonderful advise.
I’d like to explore possibilities with other distribution as I might be adding other input data in the future aside from the MTBF. Do you have any reference written on how to perform fit test in better and easiest way? To be honest, I am not really savvy into statistics. I hope you can refer me to one.
Thanks and more power.
John
Hi John,
The Real Statistics website provides support for distribution fitting for a large number of distributions.
You can also consult one of the references on this webpage.
Charles
Hi Charles,
How to know which distribution will fit my data? Is it through goodness of fit tests? I test the goodness of fit of different distributions and I compare the p-value?
Yes, this is the usual approach.
Charles
Hi Charles,
How to generate a set of random numbers to follow pareto (type 1) distribution using a giving mean/sd?
Thanks.
K
The Pareto distribution has two parameters: a scale parameter m and a shape parameter alpha. The inverse function for the Pareto distribution is I(p) = m/(1-p)^(1/alpha). If you know the values of mn and alpha then a random value from the distribution can be calculated by the Excel formula = m/(1-RAND())^(1/alpha).
Now if the mean and the standard deviation sd are known, then these can be used to calculate the m and alpha parameters by solving the equations:
mean = m*alpha/(alpha-1)
sd^2 = m^2*alpha/((alpha-1)^2*(alpha-2))
Charles
Hi,
Thank you for creating this great tool for Excel.
I have a question regarding distribution fitting. This tool estimates the parameters for different distributions. Is it possible to compare which distribution is best fit for the data (Anderson-Darling statistic etc)?
Best regards,
Jukka
Jukka,
A commonly used approach is to choose the distribution with the smallest Akaike information criterion (AIC) value. AIC = 2k – 2LL where LL = the log-likelihood (LL) and k = the number of parameters being estimated. Essentially this means that you are choosing the distribution with the largest LL value with a penalty for extra parameters. See
https://www.spcforexcel.com/knowledge/basic-statistics/deciding-which-distribution-fits-your-data-best
Another commonly used approach is the Bayesian Information Criterion (BIC), namely BIC = k*LN(n) – 2LL where n = the number of elements in the sample and LN is the natural log. BIC is used in the same way as the AIC except that the penalty for additional parameters is calculated slightly differently.
Charles
Hi Charles,
I wanted to ask if you have a continuous variable which fits a certain distribution, how can or should you bin (categorize/discretize i don’t know the correct term) according to that distribution?
I would like to tell you that the website is very well organized and filled with usefull information.
Best regards,
Leonardo
Hello Leonardo,
Glad that you like the website.
How you bin the distribution depends on what you plan to use these bins for.
Charles
Hi Charles,
Do you have any literature you can recomend on this topic? i would like to extract the maximum information from the data.
Best regards,
Leonardo
Hello Leonardo,
This is a very big topic. A number of the references in the Bibliography might be helpful. See
Bibliography
Charles
Hi Charles,
If I want to estimate the distribution of the bus inter-arrival time, what is the best distribution to fit my data? Appreciate your advice on this. Thanks.
Regards,
Jessica
Jessica,
Usually, the exponential distribution is used for this purpose. See
https://real-statistics.com/other-key-distributions/exponential-distribution/
Charles
Dear Charles,
Does GAMMA_FIT function supports curly parentheses, for instance {GAMMA_FIT(IF(R100:R138>0,R100:R138),,100)} does not create a result.
Thanks in advance,
GAMMA_FIT is an array function. See the following for how to use array functions.
Array formulas and functionsArray Formulas and FunctionsCharles
Hi Charles,
I recently came across your website and found it very very useful. Thank you so much for your great efforts!
I wanted to ask whether it would be possible to do distribution fitting via MLE (by using Real Statistics functions) for a Gumbel distribution?
Thank you so much.
With best regards,
Wayne
Wayne,
I am pleased that you are getting value from the website.
The Real Statistics software doesn’t yet support the Gumbel distribution.
Charles
Hi Charles,
Thanks for your reply. If I use Excel’s Solver to fit a Gumbel distribution, i.e. the approach taken for fitting a Weibull distribution, as described in https://real-statistics.com/distribution-fitting/distribution-fitting-via-maximum-likelihood/fitting-weibull-parameters-mle/, then how to initialize the location parameter and scale parameter of the Gumbel distribution?
Your guidance is greatly appreciated.
Best regards,
Wayne
Wayne,
The approach using Solver will probably work. I will eventually support the Gumbel distribution, but at present I haven’t researched what are good initialization values.
Charles
Hi Charles,
First, I loved you site! Helped me a lot.
Second, how can you use the MLE with Newton Method and censored data to fit a three parameter weibull distribution?
Best regards,
Leandro Dutra.
Leandro,
Glad the website has been helpful to you.
The Real Statistics website and software covers MLE with Newton Method and censored data to fit a two parameter Weibull distribution. See
https://real-statistics.com/distribution-fitting/distribution-fitting-via-maximum-likelihood/weibull-censored-data/
The three parameter version is not supported. Of course, if you fix a value for the third parameter, you can use the two parameter version.
Charles