Multinomial Distribution

Overview

The multinomial distribution is a generalization of the binomial distribution to two or more events.

Definition 1: For an experiment with the following characteristics:

  • the experiment consists of n independent trials
  • each trial has k mutually exclusive outcomes Ei
  • for each trial the probability of outcome Ei is pi

let x1 …, xk be discrete random variables whose values are the number of times outcome Ei occurs in n trials. Then x1 …, xk has a multinomial distribution. The (joint) probability distribution function (pdf) is defined as follows:

Multinomial function

Here
image602

The case where k = 2 is equivalent to the binomial distribution.

Key properties of the multinomial distribution are

  • E[xi] = npi
  • var(xi) = npi(1–pi)
  • cov(xi, xj) = –npipj for i ≠ j

Example

Example 1: Suppose that a bag contains 8 balls: 3 red, 1 green, and 4 blue. You reach in the bag pull out a ball at random and then put the ball back in the bag and pull out another ball. This experiment is repeated a total of 10 times. What is the probability that the outcome will result in exactly 4 reds and 6 blues?

The possible outcomes for each trial in this experiment are E1 = a red ball is drawn, E2 = a green ball is drawn and E3 = a blue ball is drawn. Thus p1 = 3/8, p2 = 1/8, p3 = 4/8, x1 = 4, x2 = 0 and x3 = 6, and so

image611

Worksheet Functions

Excel Function: While Excel does not provide a function for the multinomial distribution, it does provide the following function:

MULTINOMIAL(x1 …, xk) = n! / (x1!∙…∙xk!)

Thus we could also calculate the answer to Example 9.10 by using the formula

MULTINOMIAL(4,0,6)*(3/8)^4*(1/8)^0*(4/8)^6 = .064888

We can also use a range as the argument of MULTINOMIAL as shown in Figure 1.

Multinomial distribution Excel

Figure 1 – Multinomial distribution

We can use the following Excel array formula to calculate the same result

=PRODUCT(MULTINOMIAL(B3:B5),B6:B8^B3:B5)

Alternatively, we can use the following more complicated non-array formula

=MULTINOMIAL(B3:B5)*EXP(SUMPRODUCT(B3:B5,LN(B6:B8)))

Real Statistics Function: The following function in the Real Statistics Resource Pack can be used to calculate the multinomial distribution.

MULTINOMDIST(R1, R2) = the value of the pdf of the multinomial distribution where R1 is a row or column array containing the values x1, …, xk and R2 is a row or column array containing the values p1, …, pk, where R1 and R2 have the same shape.

Referring to Figure 1, we have MULTINOMDIST(B3:B5,B6:B8) = 0.064888.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Wikipedia (2021) Multinomial distribution
https://en.wikipedia.org/wiki/Multinomial_distribution

Chen, Y-C. (2020) Multinomial distribution
http://faculty.washington.edu/yenchic/20A_stat512/Lec7_Multinomial.pdf

22 thoughts on “Multinomial Distribution”

  1. HI Charles.

    Is there a way to calculate the inverse of multinomial distribution?

    I’m thinking about a set of success values (s), which is a vector of different states of successes (si), each one with a probability of success, (pi). Need to calculate the total of failures before getting the number of successes desired.

    An example (taken from Vose software) will provide clearer understanding:

    You want to make sure that at the end of the week you have called 50 people who never heard of your product, 50 people who don’t have internet at home and 200 people who use internet almost daily.

    With probabilities of success pi, and states, you can tell the number of failures you’ll have before you’ve called all the people you wanted, so you know the total number of phone calls.

    In brief, Vose get the total number with a function:
    The total number of phone calls = the total number of successes (300) + the total number of failures (NegMultinomial({50,50,200},{p1,p2,p3})).

    How can the total number be calculated in Excel without Vose function?

    Reply
  2. Hello, thank you for the tools very much.

    I use this function MULTINOMDIST(R1, R2) to get the pdf, then get the cdf by adding it up.
    but the cdf does not approach 1 (in fact it approach 1.17).

    I’m confused for this situation.
    The parameters I used are:
    (x1,x2,x3,x4,x5)=(1,1,1,1,n-4)
    (p1,p2,p3,p4,p5)=(0.02,0.02,0.02,0.02,0.92)
    for example:
    when n=4, MULTINOMDIST(Rx, Rp)=0.00000384 ; cdf=0.00000384
    when n=5, MULTINOMDIST(Rx, Rp)=0.000017664 ; cdf=0.000021504
    ….
    when n=88, MULTINOMDIST(Rx, Rp)=0.007833858 ; cdf=0.994673755
    when n=89, MULTINOMDIST(Rx, Rp)=0.007833858 ; cdf=1.002507613

    when n=170, MULTINOMDIST(Rx, Rp)=0.000125678 ; cdf=1.169895198

    Excuse me, could you help me with this question.
    Thank you very very much, again.

    Reply
  3. sorry, i type wrong.
    The parameters I used are:
    (p1,p2,p3,p4,p5)=(0.02,0.02,0.02,0.02,0.92) <-correct (=in fact to use.)

    Reply
  4. So. i get the probability from the stated calculations… how do i get the confidence intervals for each proportion?Is it somewhere on the real statistics site?

    Reply
    • The site contains lots of examples of how to calculate confidence interval. Please be more specific about the confidence interval that you are looking for. Is it for a proportion distribution? If so, see the webpage Proportion Distribution.
      Charles

      Reply
      • Hi,

        If you have a proportional multinomial distribution with probabilities p1, p2, …, pk for mutually exclusive events E1, E2, …, Ek. Then for any pi you can look at this as a binomial distribution with p = pi. 1-p is therefore the sum of the pj’s excluding pi. The confidence interval for the population version of pi is therefore as calculated for the proportional binomial distribution. This is described on the following webpage using the normal distribution approximation.
        https://real-statistics.com/binomial-and-related-distributions/proportion-distribution/

        Later today I plan to add a slightly more accurate version of the confidence interval due to Edwin Wilson, which can be used for both the binomial and multionomial distributions.

        Charles

        Reply
  5. Hi Charles,

    I have a question that relates to a multinomial distribution (not even 100% sure about this) that I hope you can help me with.
    If I take a sample (lets assume n=400) on a categorical variable that has more than two possible outcomes (e.g. blue, black, green, yellow) and plot the frequencies so that I can get the probabilities. E.g.:
    black 10%
    blue 25%
    green 35%
    yellow 30%

    How could I compute the 95% confidence interval for those probabilities?
    Any help is highly appreciated.
    Thanks a lot again for creating such a great resource.

    Kind regards,

    Dirk

    Reply
  6. Hi,

    an alternative method to calculate the probability would be an array formula:
    =PRODUCT(B9,B6:B8^B3:B5) + CTRL+SHIFT+ENTER

    Reply
    • Hi Dirk,
      Yes. I like your approach, especially since it is simpler than the one I wrote on the webpage. I have just added your suggestion to the referenced webpage. Thanks for your very useful comment.
      Charles

      Reply

Leave a Comment