Beta Conjugate Prior

If the posterior distribution is a known distribution, then our work is greatly simplified. This is especially true when both the prior and posterior come from the same distribution family. A prior with this property is called a conjugate prior (with respect to the distribution of the data).

We now consider the case where the prior has a beta distribution Bet(α, β). This distribution is characterized by the two shape parameters α and β. For any sample size n, we can view α = # of successes in n binomial trials and β = # of failures in n trials (and so n = α + β). The pdf for p = the probability of success on any single trial is given by

Beta pdf (integer parameters)

This is a special case of the pdf of the beta distribution

Beta distribution pdf

Key properties of the beta distribution are shown in Figure 1.

Beta distribution properties

Figure 1 – Key properties of beta distribution

If we use the beta distribution as our prior distribution, then the specific values of the α and β parameters determine how our prior beliefs correspond to a prior sample (even when no such sample was made) with α successes and β failures in n = α + β trials. If we believe that the probability of success and failure are about equal (and so the distribution is symmetric with αβ), then the mean is .5. If αβ then the distribution is skewed to the right (skewness is positive), while if αβ then the distribution is skewed to the left (skewness is negative). Also, the higher the value of n = α + β, the smaller the variance, and so the more confident we are of our prior beliefs.

Notice too that if αβ = 1, then the beta distribution, Bet(1, 1), is equivalent to the uniform distribution on the interval (0, 1). This can be viewed as a non-informative prior, where we have no prior belief as to the probability of success (i.e. getting heads when tossing a coin) and so all outcomes are equal.

Figure 2 contains plots of the beta distribution based on different values of α and β.

Charts of beta distribution

Figure 2 – Beta distributions

If our prior belief is that heads and tails are equally likely we can use a prior distribution of Bet(1,1), but if we are more certain of this we can use a prior distribution of Bet(3,3), Bet(10,10) or even higher, depending on our level of certainty. If we are less certain, then we can use a prior distribution of Bet(.5, .5), also called Jeffreys prior. All of these are considered to be non-informative priors.

If instead, we believe that on average, heads occur 3 times as often as tails, then we can use a Bet(3,1) prior distribution. If we are more confident of this belief, we can use a Bet(6,2), Bet(30,10) or even higher beta distribution. In general, we can choose the value of α = nμ or α = (n–2)mode+1 (for n > 2) and β = n – α.

Property 1: If x is the number of successes in n trials, following a binomial distribution with unknown parameter p, and the prior distribution is Bet(α,β) then the posterior distribution is p|xBet(α’, β’) where

α′ = α + x          β′ = β + n – x

Proof:

Prior

Likelihood function

Thus

Posterior

Since f(p|x) is proportional to the pdf of Bet(α′, β′), this completes the proof.

Observation: By Property 1, where m = α + β, the expected posterior is

Expected posterior

Expected posterior part 2

Expected posterior part 3

This means that the expected posterior is the weighted average of the expected prior and the sample mean. If we consider m as the pseudo-sample size of the prior, then weights for the expected prior and sample mean are based on the sample sizes of the prior and the data.

Example 1: Suppose that we use a uniform prior distribution and a recent poll of 100 people shows that 55 people favor Alan and 45 favor Bill, estimate the posterior distribution. What is the probability that Alan will win?

Based on Property 1, we see that the posterior distribution p|x ∼ Bet(1+55, 1+45) = Bet(56, 46). Therefore, we conclude that the probability that Alan will win is 1-BETA.DIST(.5,56,46,TRUE) = 84%.

Reference:

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014) Bayesian data analysis, 3rd Ed. CRC Press
https://statisticalsupportandresearch.files.wordpress.com/2017/11/bayesian_data_analysis.pdf

Hoff, P. D. (2009) A first course in Bayesian statistical methods. Springer
https://esl.hohoweiya.xyz/references/A_First_Course_in_Bayesian_Statistical_Methods.pdf

Leave a Comment