Bayesian Hypothesis Testing for Normal Data

Objective

We show how to perform hypothesis testing for normally distributed data using the Bayesian approach described at Bayesian Hypothesis Testing.

On this webpage, we focus on the following one-sided tests using non-informative priors. Since we use non-informative priors, the Bayes factors are not defined, and so we specify the posterior odds instead.

  • One sample test with
    • known variance
    • unknown variance
  • Two sample test
    • equal fixed variance
    • equal unknown variance
    • unequal unknown variances

Click here for a description the two-sided versions of these hypothesis tests.

Click here for a description of Excel worksheet functions and data analysis tools supplied by the Real Statistics Resource Pack that implement the tests described on this webpage.

One-sample test with known variance

Suppose we have a sample X = x1, …, xn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μ, σ2) for all i. We test the following null and alternative hypotheses.

H0: μ < 0

H1: μ ≥ 0

We assume the Jeffreys’ prior f(μ) ∝ 1 (see Non-informative Priors). Thus, the posterior is proportional to the likelihood function, and so

Posterior

The posterior probability of the null-hypothesis is therefore

Posterior of null hypothesis

where Φ is the cdf for the standard normal distribution and

z-statistic

If your decision criterion for rejecting H0 in favor of H1 is that P(H0|X) < α, then you can use the frequentist one-sided z-test, namely reject the null hypothesis if –z < zα or equivalently +z > z1-α.

Example 1: The data in Figure 1 is normally distributed with known variance 125. Test the null hypothesis that μ < 25 vs. the alternative hypothesis μ ≥ 25.

We first subtract 25 from the data and test whether μ ≥ 0 on the revised data. We see that P(H1|X) = .0706575 (cell E11) and Odds01 = 13.149 (cell E12), which is strong evidence in support of the null hypothesis.

Bayesian one-sample z-test

Figure 1 – Bayesian one-sample z-test

Note that we obtain the same results using the original data and subtracting the hypothetical mean from the sample mean in the above calculations.

One-sample test with unknown variance

We repeat the above one-sided test where the variance is unknown. This time we use the Jeffreys’ prior

f(μ, σ) = σ-3

(see Non-informative Priors). The resulting posterior for μ is

Posterior for mu

where tn is the non-standardized t-distribution. tν(µ, σ2) is referred to as T(ν, µ, σ) on that webpage.

This is the same as the usual frequentist t-test except that s2 is defined with division by n instead of n-1, and the degrees of freedom is n instead of n-1.

Click here for a proof of the above assertion.

Example 2: Repeat Example 1 where the variance is estimated from the sample.

We see from Figure 2 that P(H1|X) = .094103 (cell E10) and Odds01 = 9.626597, which is reasonably strong evidence in support of the null hypothesis.

Bayesian one-sample t-test

Figure 2 – Bayesian one-sample t-test

Note that P(H0|X) can also be calculated via the worksheet formula =T3_DIST(0,E6,E7,SQRT(E8/E6),TRUE) and P(H1|X) can be calculated via the formula =T3_DIST(E7,E6,0,SQRT(E8/E6),TRUE) .

Two-sample test with equal known variance

Suppose we have a samples X = x1, …, xm and Y = y1, …, yn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μx, σ2) and yi ~ N(μy, σ2) for all i. We test the following null and alternative hypotheses where δ = μyμx.

H0: μx > μy (or δ < 0)

H1: μx ≤ μy (or δ ≥ 0)

We use the Jeffreys’ prior f(μx, μy) = 1. It can be shown that the posterior distribution is

Two-sample posterior

Once again, this takes the form of the frequentist test with a cleaner interpretation.

Two-sample test with equal unknown variances

This time we use the Jeffreys’ prior f(μx, δ, σ2) ∝ (σ2)-2. It turns out that the posterior (calculated by integrating over μx and σ2) is 

Posterior for delta

where the pooled variance is

Pooled variance

with

Sample variances

Example 3: Test the null hypothesis μNew > μOld vs. the alternative hypothesis μNewμOld based on the data in Figure 3.

The analysis is shown in Figure 3. This time, the evidence favors the alternative hypothesis, but just barely since Odds10 = 1.105. Column I shows the formulas in column G.

Bayesian two-sample t-test

Figure 3 – Bayesian two-sample t-test

Two-sample test with unequal unknown variances

Since neither group shares any parameters, we can use the one sample approach to obtain

Posteriors for X and Y

Example 4: Repeat Example 2 of  Two Sample t Test: Unequal Variances using a Bayesian approach. The data for this example is shown in range A3:B13 of Figure 4.

Bayesian t-test simulation

Figure 4 – Two sample hypothesis test using simulation

We use Monte Carlo simulation with 2,000 iterations as shown in Figure 4. First we calculate the size, mean, and standard deviation for the two samples, as shown in range A15:B17. This is done by placing the formulas COUNT(A3:A13), AVERAGE(A3:A13, and STDEV.P(A3:A13) in cells A15, A16, and A17. We then highlight range A15:B17, and press Ctrl-R.

Next, we insert the formula =A$16+A$17/SQRT(A$15)*T.INV(RAND(),A$15) in cell E3, highlight range E3:F2002, and press Ctrl-R and Ctrl-D. This yields samples X and Y from two t-distributions with the parameters corresponding to the samples in columns A and B. Next, we obtain the values for δ by placing the formula =F3-E3 in cell G3, highlighting G3:G2002, and pressing Ctrl-D

Finally, we obtain the counts for δ > 0 and δ ≤ 0, namely 47 and 1953, as shown in cells J3 & J4. From this simulation, we see that P(H0|X,Y) = 1953/2000 = .9765 and P(H1|X,Y) = 47/2000 = .0235.

The simulation also produces an estimated 95% credible interval of (-17.20, -.138) for the mean, as shown in range J13:J14).

We see that data favors the null hypothesis with posterior Odds01 = .9765/.0235 = 41.55319 (cell J7), which indicates that the difference between the two samples is quite strong per Figure 1 of Bayesian Hypothesis Testing.

Note too that we could also get the simulated data in Figure 4 by inserting the formula =T3_INV(RAND(),A$15, A$16, A$17/SQRT(A$15)) in cell E3. Finally, note that we could get a more accurate result by increasing the number of simulations.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Reich, B. J., Ghosh, S. K. (2019) Bayesian statistics methods. CRC Press

Lee, P. M. (2012) Bayesian statistics an introduction. 4th Ed. Wiley
https://www.wiley.com/en-us/Bayesian+Statistics%3A+An+Introduction%2C+4th+Edition-p-9781118332573

Jordan, M. (2010) Bayesian modeling and inference. Lecture 1. Course notes
https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture1.pdf

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014) Bayesian data analysis, 3rd Ed. CRC Press
https://statisticalsupportandresearch.files.wordpress.com/2017/11/bayesian_data_analysis.pdf

Clyde, M. et al. (2022) An introduction to Bayesian thinking
https://statswithr.github.io/book/_main.pdf

Leave a Comment