Fitting a Triangular Distribution via MLE

Log-likelihood Function

Suppose that we have a sample x1, x2, …, xn from a triangular distribution with parameters a, b, c. Let’s also suppose that the data elements are in sorted order x1 ≤ x2 ≤ … ≤ xn.

Since the pdf of the triangular distribution for a ≤ x ≤ c is

Triangular distribution pdf

the likelihood function is given by

Likelihood function triangular distribution

where r is an index such that

Definition of r

where x0 = a  and xn+1 = c. It follows that

Equivalent likelihood function

Thus, the log-likelihood function is

Log-likelihood triangular distribution

As shown in Kotz and van Dorp, the maximum value of LL is achieved when b = xr for some r = 1, 2, …, n. Thus, we can choose values of a < x1 and c > xn, but once these values are chosen, b is restricted to x1, .., xn.

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack provides the following worksheet functions for the data in R1.

TRIANG_MLE(R1, a, b, c) = LL for a triangular distribution with parameters a, b, and c which fits the data in array R1.

TRIANG_FIT(R1, lab, lo, hi, iter, exta, extc, iterc): returns a column array with the estimated values of the parameters a, b, c; along with the mean, variance, and skewness based on the data in R1 as well as based on the distribution parameters; and finally the MLE and the sum of squares.

If lab = TRUE (default FALSE), then a column of labels is appended to the output.

If lo is present, then a is set to lo; otherwise, a is calculated using the MLE. If hi is present, then c is set to hi; otherwise, c is calculated using MLE.

Improving the estimates

If lo is missing, then the function assumes that a is in the range

mn – exta ≤ a < mn

where mn is the smallest element in R1. In particular, the function tests a1, a2, …, ak as potential values for a where k = iter, and

a1 = mn – exta         ai+1 = ai + exta/iter

Similarly, if hi is missing, then the function assumes that c is in the range

mx < c ≤ mx + extc

where mn is the smallest element in R1. In particular, the function tests c1, c2, …, ch as potential values for c where h = iterc, and

c1 = mx + extc         ci+1 = ci – extc/iterc

If exta is missing, then it defaults to (mx – mn)/2. When extc is missing, then it defaults to the value of exta. If iter is missing, it defaults to 100. If iterc is missing it defaults to the value of iter.

Finally, for every combination of a and c, as described above, the function tests the values x1, x2, …, xn in R1 as potential values for b (for these a and c values).

If lo and hi are specified, then a and c are defined, and so only b needs to be estimated; this requires iter iterations. If neither hi nor lo is specified, then all three parameters need to be specified; this requires iter3 iterations (which is 1,000,000 for the default value of iter). When hi or lo is specified, but not both, then iter2 iterations are required (10,000 for the default value of iter).

Comparison of moments

For TRIANG_FIT, MLE is the maximum LL value for the data in R1; the goal is to determine the a, b, and c values that maximize this value.

For a triangular distribution, the mean, variance, and skewness can be calculated via the following formulas (see Triangular Distribution).

Distribution mean

Distribution variance

Distribution skewness

The corresponding sample values m, v, and w can be calculated from the data in R1 via the formulas =AVERAGE(R1), =VAR.S(R1), and =SKEW(R1). We set sumsq equal to

(mean – m)2 + (var – v)2 + (skew – w)2

The smaller the value of sumsq, the better the fit.

Example when all parameters are unknown

Example 1: Fit the data in range B2:G11 of Figure 1 to a triangular distribution using MLE (i.e. by maximizing LL). 

Fitting triangular distribution 1          

Figure 1 – Fitting data to triangular distribution by MLE (part 1)

We use the formula =TRIANG_FIT(B2:G11,TRUE) to obtain the results shown in V2:W12 of Figure 2.

Fitting triangular distribution 2

Figure 2 – Fitting data to triangular distribution by MLE (part 2)

If we want to improve the accuracy of our estimates, we can increase the value of iter or reduce the intervals used to find a and c. We show how to do the latter.

Improving accuracy

As described above, the formula tests values of a and c in the ranges

mn – exta ≤ a < mn          mx < c ≤ mx + extc

Since exta and extc default to (mx–mn)/2 =  (3.321055 – 1.163764)/2 = 1.07864. These ranges are 1.0864 wide. In fact, the value for a in cell W2 is only 1163764 – 1.066686 = .097078 from the minimum data value. Similarly, c in cell W4 is only 3.504425 – 3.321055 = .18337 away from the maximum value. Thus, we can actually use exta = .097078 and extc = .18337. Since these will produce much smaller ranges, we should get more accurate results. Just to be safe, we’ll choose slightly wider ranges where exta = .1 and extc = .2 by using the formula =TRIANG_FIT(B2:G11,TRUE,,,,.1,.2).

This means that we have reduced the interval for a by a factor of more than 10 and the interval for c by a factor of more than 5, for a 50-fold improvement. We would need to increase iter to 1000 and iterc to 500 to get a similar improvement (at the cost of much slower processing time). The result is shown in Y2:Z12 of Figure 2. The improvement is pretty minor: the MLE value is only .0002 higher.

Finally, while this approach maximizes MLE. The resulting skewness (cell W10) is not consistent with the skewness of the sample (cell W9). As expected, the method of moments will bring these estimates more in line with the sample values. As we see in MoM: Triangular Distribution (under construction), the LL resulting from the method of moments (-43.4102) is only slightly lower than that found above (-43.1759), and so for this example, the MoM seems to give a better fit.

Examples when some parameters are known

Example 2: (a) Suppose we know that for the data in Figure 1, a = 1. Find the values of b and c that maximize MLE. (b) Suppose we know that c = 4, find the values of a and b that maximize MLE, Finally, (c) suppose we know that a = 1 and c = 4, find the value of b that maximizes MLE.

The results are shown in Figure 3.

Fit: some parameters known

Figure 3 – Results for Example 2

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Cook, J. (2015) Fitting a triangular distribution
https://www.johndcook.com/blog/2015/03/24/fitting-a-triangular-distribution/#:~:text=One%20way%20to%20fit%20a,if%20these%20values%20are%20known.

Stack Exchange (2016) MLE for triangular distribution
https://stats.stackexchange.com/questions/64102/mle-for-triangle-distribution/64103#64103

Kotz, S., van Dorp, R.(2004) The triangular distribution. Beyond Beta
https://books.google.it/books/about/Beyond_Beta_Other_Continuous_Families_Of.html?id=JO7ICgAAQBAJ&redir_esc=y

Leave a Comment