Objective
We show how to perform hypothesis testing for normally distributed data using the Bayesian approach described at Bayesian Hypothesis Testing.
On this webpage, we focus on the following one-sided tests:
- One sample test with
- known variance
- unknown variance
- Two sample test
- equal fixed variance
- equal unknown variance
- unequal unknown variances
Click here for a description the two-sided versions of these hypothesis tests.
Caution: In what follows we will use non-informative priors. Technically, in this case the Bayes Factor is undefined. In any case, we will misuse the notation BF01 to represent the posterior odds P(H0|X)/P(H1|X). This is sort like assuming that P(H0) = P(H1).
One-sample test with known variance
Suppose we have a sample X = x1, …, xn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μ, σ2) for all i. We test the following null and alternative hypotheses.
H0: μ < 0
H1: μ ≥ 0
We assume the Jeffreys’ prior f(μ) ∝ 1 (see Non-informative Priors). Thus, the posterior is proportional to the likelihood function, and so
The posterior probability of the null-hypothesis is therefore
where Φ is the cdf for the standard normal distribution and
If your decision criterion for rejecting H0 in favor of H1 is that P(H0|X) < α, then you can use the frequentist one-sided z-test, namely reject the null hypothesis if –z < zα or equivalently +z > z1-α.
Example 1: The data in Figure 1 is normally distributed with known variance 125. Test the null hypothesis that μ < 25 vs. the alternative hypothesis μ ≥ 25.
We first subtract 25 from the data and test whether μ < 0 on the revised data. We see that P(H0|X) = .0706575 (cell E10) and BF10 = 13.149, which is strong evidence in support of the alternative hypothesis.
Figure 1 – Bayesian one-sample z-test
Note that we obtain the same results using the original data and subtracting the hypothetical mean from the sample mean in the above calculations.
One-sample test with unknown variance
We repeat the above one-sided test where the variance is unknown. This time we use the Jeffreys’ prior
f(μ, σ) = σ-3
(see Non-informative Priors). The resulting posterior for μ is
where tn is the non-standardized t-distribution. tν(µ, σ2) is referred to as T(ν, µ, σ) on that webpage.
This is the same as the usual frequentist t-test except that s2 is defined with division by n instead of n-1, and the degrees of freedom is n instead of n-1.
Click here for a proof of the above assertion.
Example 2: Repeat Example 1 where the variance is estimated from the sample.
We see from Figure 2 that P(H0|X) = .094103 (cell E10) and BF10 = 9.626597, which is reasonably strong evidence in support of the alternative hypothesis.
Figure 2 – Bayesian one-sample t-test
Note that P(H0|X) can also be calculated via the worksheet formula =T3_DIST(E7,E6,0,SQRT(E8/E6),TRUE) and P(H1|X) can be calculated via the formula =T3_DIST(0,E6,E7,SQRT(E8/E6),TRUE).
Two-sample test with equal known variance
Suppose we have a samples X = x1, …, xm and Y = y1, …, yn that comes from a normally distributed population with known fixed variance; i.e. xi ~ N(μx, σ2) and yi ~ N(μy, σ2) for all i. We test the following null and alternative hypotheses where δ = μy – μx.
H0: μx < μy (or δ > 0)
H1: μx ≥ μy (or δ ≤ 0)
We use the Jeffreys’ prior f(μx, μy) = 1. It can be shown that the posterior distribution is
Once again, this takes the form of the frequentist test with a cleaner interpretation.
Two-sample test with equal unknown variances
This time we use the Jeffreys’ prior f(μx, δ, σ2) ∝ (σ2)-2. It turns out that the posterior (calculated by integrating over μx and σ2) is
where the pooled variance is
with
Example 3: Test the null hypothesis μNew < μOld vs. the alternative hypothesis μNew > μOld based on the data in Figure 3.
The analysis is shown in Figure 3. This time, the evidence favors the null hypothesis, but just barely since BF01 = 1.105. Column I shows the formulas in column G.
Figure 3 – Bayesian two-sample t-test
Two-sample test with unequal unknown variances
Since neither group shares any parameters, we can use the one sample approach to obtain
Example 4: Repeat Example 2 of Two Sample t Test: Unequal Variances using a Bayesian approach. The data for this example is shown in range A3:B13 of Figure 4.
Figure 4 – Two sample hypothesis test using simulation
We use Monte Carlo simulation with 2,000 iterations as shown in Figure 4. First we calculate the size, mean, standard deviation and standard error for the two samples, as shown in range A15:B18. This is done by placing the formulas COUNT(A3:A13), AVERAGE(A3:A13, STDEV.P(A3:A13), and =A17/SQRT(A15) in cells A15, A16, A17, and A18. We then highlight range A15:B17, and press Ctrl-R.
Next, we insert the formula =T3_INV(RAND(),A$15,A$16,A$18) in cell E3, highlight range E3:F2002, and press Ctrl-R and Ctrl-D. This yields samples X and Y from two t-distributions with the parameters corresponding to the samples in columns A and B. Next, we obtain the values for δ by placing the formula =F3-E3 in cell G3, highlighting G3:G2002, and pressing Ctrl-D.
Finally, we obtain the counts for δ > 0 and δ ≤ 0, namely 47 and 1953, as shown in cells J3 & J4. From this simulation, we see that P(H0|X,Y) = 47/2000 = .0235 and P(H1|X,Y) = 1953/2000 = .9765.
We see that data favors the alternative hypothesis. The Bayesian factor BF01 = .0235/.9765 = .02466 (cell J7). Taking the inverse, BF10 = 41.55315, which indicates that the difference between the original two samples is quite strong per Figure 1 of Bayesian Hypothesis Testing.
Note too that we could also get the simulated data in Figure 4 by inserting the formula =A$16+T.INV(RAND(),A$15)*A$18 in cell E3. Finally, note that we could get a more accurate result by increasing the number of simulations.
Examples Workbook
Click here to download the Excel workbook with the examples described on this webpage.
References
Reich, B. J., Ghosh, S. K. (2019) Bayesian statistics methods. CRC Press
Lee, P. M. (2012) Bayesian statistics an introduction. 4th Ed. Wiley
https://www.wiley.com/en-us/Bayesian+Statistics%3A+An+Introduction%2C+4th+Edition-p-9781118332573
Jordan, M. (2010) Bayesian modeling and inference. Lecture 1. Course notes
https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture1.pdf
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014) Bayesian data analysis, 3rd Ed. CRC Press
https://statisticalsupportandresearch.files.wordpress.com/2017/11/bayesian_data_analysis.pdf
Clyde, M. et al. (2022) An introduction to Bayesian thinking
https://statswithr.github.io/book/_main.pdf