Bayesian Independence Testing

Objective

We describe the Bayesian approach to determining whether the two variables defined by a contingency table are independent. This is the Bayesian equivalence to the chi-square test of independence or Fisher’s exact test. See also Bayesian Hypothesis Testing.

Terminology

We assume that we have an m × n contingency table X = [xij] and want to test the following hypotheses:

H0: Row/column independence

H1: Row/column dependence

We now define

Terminology 1

Terminology 2

We also assume priors A = [aij], and define similar quantities for the A matrix to those described above for the X matrix. In addition, we define the following:

Terminology 3

Finally, for any m × n matrix Y = [yij], we define

Dirichlet function for matrices

and for any vector Y = [yi], we define

Dirichlet function for vectors

Example

We explain this terminology further using the contingency table in Figure 1.

Contingency table example

Figure 1 – Contingency Table

Based on the above terminology, we see that

x21 = 2, x.1 = 11, xT = 34, Xcol = (11, 23), m = n = 2

If aij = 1 for all i,j, then aT = 4, Arow = Acol = (2,2)

Crow = (2-1, 2-1) = (1, 1), c = 4-1 = 3

D(Xrow) = Γ(18) ⋅ Γ(16) / Γ(34) = 17!15!/33! = 5.34652E-11

We now show how to test the independence hypotheses described above using several different approaches.

Poisson sampling

In this sampling approach, we assume that none of the cell counts are fixed. We assume that the cell counts follow a Poisson distribution where the mean/rate parameters λij have a gamma distribution with shape parameters aij and scale parameter b.

xijPoisson(λij)

λijGamma (aij, b)

We assume that all the aij are the same with value a (default 1), and that the default for b is

b = mna/xT

The Bayes Factor for this sampling approach is

BF01 for Poisson sampling

For a 2 × 2 contingency table with a = 1, it follows that

BF10 for 2x2 table

Example

Example 1: Calculate BF01 for the contingency table in range B2:D3 of Figure 2 based on the prior parameter a = 1.

We show the table of priors including totals in range B7:E9. Range F7:F8 contains Crow, B10:D10 contains Ccoland cell F10 contains c. The right side of the figure calculates BF01 as a product of the 5 terms in the above formula. To simplify the calculations we use the DFunc worksheet function defined below.

Poisson sampling exampleFigure 2 – Poisson sampling example

Joint multinomial sampling

In the joint multinomial sampling approach we assume that only the grand total xT is fixed and

(x11, …, xmn) ∼ Multinomial(xT, π)

where π takes a Dirichlet distribution:

π ∼ Dirichlet(a11, …, amn)

In this case

BF01 joint multinomial sampling

For a 2 × 2 contingency table with a = 1, it follows that

B10 2x2 case

Independent multinomial sampling

This time, we assume that either the row or column totals are known. If the row totals are known then

BF01 multinomial rows fixed

If the column totals are known then

Multinomial fixed column totals

For a 2 × 2 table with a = 1

BF10 2x2 table

Hypergeometric sampling

This time we assume that all marginal totals are fixed.

BF01 hypergeometric sampling

For a 2 × 2 table with a = 1

BF10 2x2 hypergeometric sampling

where we have chosen x1. to be the smallest of the marginal totals.

Worksheet Functions

Click here for a description of worksheet functions and data analysis tools that can be used to perform Bayesian independence testing in Excel.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Jamil, T., Ly, A., Morey, R. D., Love, J., Marsman, M., Wagenmakers, E-J. (2016) Default “Gunel and Dickey” Bayes factors for contingency tables
https://www.alexander-ly.com/wp-content/uploads/2014/09/JamilEtAlGunelDickeyinpress.pdf

Albert, J. (2009) Bayesian computation with R, 2nd ed. Springer

Leave a Comment