Bayesian Hypothesis Testing for Normal Data

Objective

We show how to perform hypothesis testing for normally distributed data using the Bayesian approach described at Bayesian Hypothesis Testing.

On this webpage, we focus on the following one-sided tests:

  • One sample test with
    • known variance
    • unknown variance
  • Two sample test
    • equal fixed variance
    • equal unknown variance
    • unequal unknown variances

Click here for a description the two-sided versions of these hypothesis tests.

Caution: In what follows we will use non-informative priors. Technically, in this case the Bayes Factor is undefined. In any case, we will misuse the notation BF01 to represent the posterior odds P(H0|X)/P(H1|X). This is sort like assuming that P(H0) = P(H1).

One-sample test with known variance

Suppose we have a sample X = x1, …, xn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μ, σ2) for all i. We test the following null and alternative hypotheses.

H0: μ < 0

H1: μ ≥ 0

We assume the Jeffreys’ prior f(μ) ∝ 1 (see Non-informative Priors). Thus, the posterior is proportional to the likelihood function, and so

Posterior

The posterior probability of the null-hypothesis is therefore

Posterior of null hypothesis

where Φ is the cdf for the standard normal distribution and

z-statistic

If your decision criterion for rejecting H0 in favor of H1 is that P(H0|X) < α, then you can use the frequentist one-sided z-test, namely reject the null hypothesis if –z < zα or equivalently +z > z1-α.

Example 1: The data in Figure 1 is normally distributed with known variance 125. Test the null hypothesis that μ < 25 vs. the alternative hypothesis μ ≥ 25.

We first subtract 25 from the data and test whether μ < 0 on the revised data. We see that P(H0|X) = .0706575 (cell E10) and BF10 = 13.149, which is strong evidence in support of the alternative hypothesis.

Bayesian one-sample z-test

Figure 1 – Bayesian one-sample z-test

Note that we obtain the same results using the original data and subtracting the hypothetical mean from the sample mean in the above calculations.

One-sample test with unknown variance

We repeat the above one-sided test where the variance is unknown. This time we use the Jeffreys’ prior

f(μ, σ) = σ-3

(see Non-informative Priors). The resulting posterior for μ is

Posterior for mu

where tn is the non-standardized t-distribution. tν(µ, σ2) is referred to as T(ν, µ, σ) on that webpage.

This is the same as the usual frequentist t-test except that s2 is defined with division by n instead of n-1, and the degrees of freedom is n instead of n-1.

Click here for a proof of the above assertion.

Example 2: Repeat Example 1 where the variance is estimated from the sample.

We see from Figure 2 that P(H0|X) = .094103 (cell E10) and BF10 = 9.626597, which is reasonably strong evidence in support of the alternative hypothesis.

Bayesian one-sample t-test

Figure 2 – Bayesian one-sample t-test

Note that P(H0|X) can also be calculated via the worksheet formula =T3_DIST(E7,E6,0,SQRT(E8/E6),TRUE) and P(H1|X) can be calculated via the formula =T3_DIST(0,E6,E7,SQRT(E8/E6),TRUE).

Two-sample test with equal known variance

Suppose we have a samples X = x1, …, xm and Y = y1, …, yn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μx, σ2) and yi ~ N(μy, σ2) for all i. We test the following null and alternative hypotheses where δ = μyμx.

H0: μx < μy (or δ > 0)

H1: μx ≥ μy (or δ ≤ 0)

We use the Jeffreys’ prior f(μx, μy) = 1. It can be shown that the posterior distribution is

Two-sample posterior

Once again, this takes the form of the frequentist test with a cleaner interpretation.

Two-sample test with equal unknown variances

This time we use the Jeffreys’ prior f(μx, δ, σ2) ∝ (σ2)-2. It turns out that the posterior (calculated by integrating over μx and σ2) is 

Posterior for delta

where the pooled variance is

Pooled variance

with

Sample variances

Example 3: Test the null hypothesis μNewμOld vs. the alternative hypothesis μNew > μOld based on the data in Figure 3.

The analysis is shown in Figure 3. This time, the evidence favors the null hypothesis, but just barely since BF01 = 1.105. Column I shows the formulas in column G.

Bayesian two-sample t-test

Figure 3 – Bayesian two-sample t-test

Two-sample test with unequal unknown variances

Since neither group shares any parameters, we can use the one sample approach to obtain

Posteriors for X and Y

Example 4: Repeat Example 2 of  Two Sample t Test: Unequal Variances using a Bayesian approach. The data for this example is shown in range A3:B13 of Figure 4.

Two sample simulation

Figure 4 – Two sample hypothesis test using simulation

We use Monte Carlo simulation with 2,000 iterations as shown in Figure 4. First we calculate the size, mean, standard deviation and standard error for the two samples, as shown in range A15:B18. This is done by placing the formulas COUNT(A3:A13), AVERAGE(A3:A13, STDEV.P(A3:A13), and =A17/SQRT(A15) in cells A15, A16, A17, and A18. We then highlight range A15:B17, and press Ctrl-R.

Next, we insert the formula =T3_INV(RAND(),A$15,A$16,A$18) in cell E3, highlight range E3:F2002, and press Ctrl-R and Ctrl-D. This yields samples X and Y from two t-distributions with the parameters corresponding to the samples in columns A and B. Next, we obtain the values for δ by placing the formula =F3-E3 in cell G3, highlighting G3:G2002, and pressing Ctrl-D

Finally, we obtain the counts for δ > 0 and δ ≤ 0, namely 47 and 1953, as shown in cells J3 & J4. From this simulation, we see that P(H0|X,Y) = 47/2000 = .0235 and P(H1|X,Y) = 1953/2000 = .9765.

We see that data favors the alternative hypothesis. The Bayesian factor BF01 = .0235/.9765 = .02466 (cell J7). Taking the inverse, BF10 = 41.55315, which indicates that the difference between the original two samples is quite strong per Figure 1 of Bayesian Hypothesis Testing.

Note too that we could also get the simulated data in Figure 4 by inserting the formula =A$16+T.INV(RAND(),A$15)*A$18 in cell E3. Finally, note that we could get a more accurate result by increasing the number of simulations.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Reich, B. J., Ghosh, S. K. (2019) Bayesian statistics methods. CRC Press

Lee, P. M. (2012) Bayesian statistics an introduction. 4th Ed. Wiley
https://www.wiley.com/en-us/Bayesian+Statistics%3A+An+Introduction%2C+4th+Edition-p-9781118332573

Jordan, M. (2010) Bayesian modeling and inference. Lecture 1. Course notes
https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture1.pdf

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014) Bayesian data analysis, 3rd Ed. CRC Press
https://statisticalsupportandresearch.files.wordpress.com/2017/11/bayesian_data_analysis.pdf

Clyde, M. et al. (2022) An introduction to Bayesian thinking
https://statswithr.github.io/book/_main.pdf

Leave a Comment