Real Statistics Bayesian Analysis Functions

The following is a summary of worksheet functions provided in the Real Statistics Resource Pack that support Bayesian statistical analysis. These functions are organized into the following categories:

  • Bayesian Analysis using Grids
  • Bayesian Beta Testing
  • Bayesian Binomial/Proportion Testing
  • Bayesian T Tests
  • Bayesian Signed-Rank Test
  • Bayesian Mann-Whitney Test
  • Bayesian Contrasts
  • Bayesian Independence Test
  • Bayesian Correlation Test
  • Bayesian Kendall’s Tau Test
  • Bayesian Gamma Test
  • Bayesian Sign and Median Test
  • Miscellaneous

For many of the following functions, we use the following arguments:

If lab = TRUE (default FALSE) then a column of labels is appended to the output. alpha is used for the credible intervals (default .05). 

Bayesian Analysis using Grids

In the following, a grid is defined using column arrays Rx and Ry. For each x value in R1, R2 contains the corresponding f(x) pdf value. lprec and uprec are the lower and upper precision values (default .1).

GRID_HDI(Rx, Ry, lab, alpha, lprec, uprec): returns a column array with the following entries: endpoints of the HDI, length of the HDI and the actual value for 1–alpha based on the HDI. See Credible Interval and HDI.

GRID_DIST(x, Rx, Ry, cum) = the pdf f(x) if cum = FALSE based on the grid in Rx and Ry, and the cdf F(x) if cum = TRUE.

GRID_INV(p, Rx, Ry) = the inverse F-1(p) based on the grid in Rx and Ry.

GridDesc(Rx, Ry, lab, hyp, alpha, lprec, uprec): returns an array with mean, median, mode, HDI (lower/upper), equal-tailed CI (upper/lower), BF01, BF10, P(H0|X), P(H1|X) based on the grid in Rx and Ry where H0 is p < hyp (default .5).

GridSpline(Rx, Ry, rate): returns a two-column array with an expansion to the grid defined by Rx and Ry. The first column of the output contains the x values while the second column contains the corresponding y values where each value in the original grid except the last is expanded to rate (default 10) number of values using spline interpolation.

See Bayesian Analysis using Grids for more details.

Bayesian Beta Testing

BayesBeta(a, b, lab, alpha, hyp): returns an array with mean, median, mode, HDI (lower/upper), equal-tailed CI (upper/lower), BF01, BF10, P(H0|X), P(H1|X) for a beta distribution with parameters a and b where H0 is p < hyp (default .5).

See Bayesian Characterization of a Beta Distribution for more details.

BETA_EFFECT(bf, n) = minimum φ value needed to obtain BF10 = bf for a sample of size n

BETA_SIZE(bf, φ) = minimum sample size n needed to obtain BF10 = bf for an effect of size φ.

See Bayesian Beta Test Sample and Effect Sizes for more details.

BETA_HDI(alpha, a, b, lab, iter): returns an array for a beta distribution with parameters a and b with the following entries: endpoints of the 1– alpha HDI, length of the HDI and the pdf values at the two endpoints. iter = the number of iterations (default 40) in the divide and conquer algorithm used.

BETA2_HDI(alpha, a, b, lab, iter): just like the BETA_HDI worksheet function but for the beta prime distribution.

See High Density Interval for more details.

Bayesian Binomial/Proportion Testing

BayesBinom(n, s, lab, a, b, p): returns an array with BF01, BF10, P(H0|X), and P(H1|X) for a two-sided one-sample proportion test where n = the sample size, s = the number of successes, p = the test proportion (default .5), and a (default 1) and b (default 1) are the parameters for the beta distribution prior. Here H0: population proportion = p and H1: population proportion ≠ p.

BayesBinom0(n, s, lab, a, b, p): similar to BayesBinom where H0: population proportion < p and H1: population proportion ≥ p. This corresponds to one-sided analysis described in Beta Conjugate Prior.

BinomP(n, s, p)  = the q value described in Bayesian Binomial Hypothesis Testing, used for the H: population proportion < p and H+: population proportion ≥ p.

See Bayesian Binomial Testing Tools for more details.

Bayesian T Tests

T_STATP(R1, R2, paired) = t statistic (used with two-sided tests).

If R2 takes a numeric value, then the t-stat for a one sample test using the data in R1 is returned where R2 is the hypothetical mean (default 0). If paired = TRUE (default FALSE) then the t-stat for a paired t-test is returned using the data in R1 and R2; otherwise, the t-stat is for a two independent sample t-test using the data in R1 and R2.

T_STATP(R1, R2, paired) = t statistic based on the population of the variance instead of the sample variance as in T_STAT (used with one-sided tests).

We also have the following array functions where R1, R2, paired are as above.

One-sided T Tests

BayesT0_TEST(t, n, lab): returns an array with Odds01Odds10P(H0|X), and P(H1|X) for a one-sided t-test where t = the t-statistic, n = size of sample 1 for the one-sample or paired sample test or the sum of the two sample sizes for a two sample test. 

BayesTX_TEST(R1, R2, lab, paired, alpha): returns an array with Odds01Odds10,  P(H0|X), and P(H1|X), (pooled) mean, scale, df, and lower and upper limits of the 1-alpha credible interval.

The above two functions assume that the population variances for the two independent sample case are equal. Otherwise, the following function is available. This function uses a simulation with iter iterations (default iter = 10000).

BayesTS_TEST(R1, R2, lab, iter, alpha): returns an array with Odds01, Odds10, P(H0|X), and P(H1|X), (pooled) mean, and lower and upper limits of the 1-alpha credible interval for the two independent sample t-test based on the data in R1 and R2.

Real Statistics also provides the following counterpart to the BayesT0_TEST function when the variance is known – the var argument in the following description (default var = 1).

BayesZ_TEST(R1, R2, var, lab, paired, alpha): returns an array with Odds01,  Odds10P(H0|X), and P(H1|X), (pooled) mean, scale, df, and lower and upper limits of the 1-alpha credible interval.

See Bayesian t Test Tools for details.

Two-sided T Tests

It is assumed that the population variances for the two independent sample case are equal.

BayesT_TEST(t, n1, n2, scale, lab): returns an array with BF01BF10P(H0|X), and P(H1|X) for a two-sided weakly informational t test where t = the t-statistic, n1 = size of sample 1, n2 = size of sample 2 (default = 0, which implies this is a one-sample or paired sample test), scale = scale parameter (default √2/2). Here H0δ = 0 and H1δ ≠ 0 where δ = μ1 – μ0 for the one-sample test (μ0 = hypothetical mean) or δ = μ2 – μ1 for the two-sample or paired sample test. 

JZS_TEST(t, n1, n2, scale, lab, iter): exactly as the BayesT_TEST except that the JZS prior is used. iter = the number of iterations used to calculate the integral in the formula for BF01.

See Bayesian t Test Tools for details.

Effect and Sample Sizes

JZS_EFFECT(bf, n1, n2, scale, iter) = minimum value of the effect size δ for a t-test using a JZS prior with the specified scale (default √2/2). If n2 = 0 (default) then a single sample of size n1 is considered; otherwise two samples of sizes n1 and n2 are considered.

JZS_SIZE(bf, effect, , scale, iter) = minimum size for which BF10 ≥ bf for a one sample t-test with the specified effect size using a JZS prior with the specified scale (default √2/2).

JZS_SIZE(bf, effect, ratio, scale, iter) = minimum size for which BF10 ≥ bf for a two sample t-test with the specified effect size using a JZS prior with the specified scale (default √2/2). Here the size of the second sample equals the size of the first sample times ratio (the samples have the same size when ratio = 1).

If ratio is a negative number then the second sample has ratio fewer elements than the first sample. If ratio is zero then we use the one-sample version of the function. iter is used to compute the integral used (default 10000).

The worksheet functions BayesT_EFFECT(bf, n1, n2, scale) and BayesT_SIZE(bf, effect, ratio, scale) are also available. These are defined as above based on a t-test using a weakly informational normal prior.

The worksheet functions BayesT0_EFFECT(odds, n1, n2) and BayesT0_SIZE(odds, effectratio) are also available. These are defined as above based on a t-test using a Jeffreys’ non-informational prior where Odds10 ≥ odds is used instead of BF10 ≥ bf.

See Bayesian t Test Sample Size for details.

Bayesian Signed-Rank Test

In the following, R1 is a column array of difference values, iter is the number of Monte Carlo samples (default 30000). lprec and uprec are defined as for GRID_HDI. arg1 is a column array of ranks, but if arg1 is instead a positive integer n, then the ranks 1, 2, …, n with no ties is used.

BayesSRT(R1): returns a column array with n and T+ where n = the # of non-zero numeric entries in R1.

BayesSR1(arg1, tplus, phi, iter) = estimated likelihood value corresponding to the specified phi and T+ values based on the column array of ranks in arg1 using iter-many Monte Carlo samples.

BayesSR1X(arg1, tplus, iter): returns a column array with 200 rows consisting of estimated pdf values for the specified T+ value and each of 200 phi values starting from .0025 in increments of .005 based on the column array of ranks in arg1 where each grid entry in the output is calculated using iter-many Monte Carlo samples.

BayesSRX(R1, iter): outputs a column array with 200 grid entries, as described above, where each entry is calculated using iter-many Monte Carlo samples.

BayesSRSm(R1, lab, iter, alpha, lprec, uprec): returns an array with n, T+, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X) and P(H1|X) based on the null hypothesis μ < .5. The output is based on a grid created using 200 phi values starting from .0025 in increments of .005 based where each grid entry is calculated using iter-many Monte Carlo samples. 

BayesSRLg(R1, lab, a, b, alpha): returns an array with nT+, posterior Beta parameters, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X) and P(H1|X) based on the null hypothesis μ < .5. The output is based on a Beta distribution approximation using a Beta prior with parameters a (default 1) and b (default 1).

BayesSR1Sm(n, tplus, lab, iter, alpha, lprec, uprec): equivalent to BayesSRSm for any array R1 which has n non-zero entries and has a T+ value equal to tplus based on ranks with no ties.

BayesSR1Lg(n, tplus, lab, a, b, alpha): equivalent to BayesSRLg for any array R1 which has n non-zero entries and has a T+ value equal to tplus based on ranks with no ties.

See Bayesian Signed-Rank Test Support for more details.

Bayesian Mann-Whitney Test

In the following, Rx and Ry are numeric column arrays with sample data, iter is the number of Monte Carlo samples (default 30000). lprec and uprec are defined as for GRID_HDI. 

BayesMWU(Rx, Ry): returns UX

BayesMWX(Rx, Ry, iter): returns a column array with 200 entries corresponding to values of theta from .0025 to .9975 in .005 increments. Each entry contains the pdf value based on the iter-many Monte Carlo sample pairs that produce a UX value equal to the UX value for the original sample Rx and Ry.

BayesMWSm(Rx, Ry, lab, iter, alpha, lprec, uprec): returns an array with nX, nY, UX, UY, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X,Y) and P(H1|X,Y) based on the null hypothesis θX < .5 (equivalent to X stochastically dominates Y). The output is based on a grid created using 200 phi values starting from .0025 in increments of .005 based where each grid entry is calculated using iter-many Monte Carlo samples.

BayesMWLg(Rx, Ry, lab, a, b, alpha): returns an array with nX, nY, UXUY, the posterior Beta parameters, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X) and P(H1|X) for a Bayesian Mann-Whitney test based on the null hypothesis θX < .5 (equivalent to X stochastically dominates Y). The output is based on a Beta distribution approximation using a Beta prior with parameters a (default 1) and (default 1).

The following functions are based on two samples of sizes n1, n2 with U values u1u2.

BayesMW1(n1, n2, u1, theta, iter) = the proportion of the iter-many Monte Carlo sample pairs of size n1 and n2 that produce a U1 value equal to u1 based on the specified theta value. This value serves as the likelihood estimate for the specified value of theta.

BayesMW1X(n1, n2, u1, iter): returns a column array with 200 entries corresponding to values of theta from .0025 to .9975 in .005 increments. Each entry contains the pdf value based on the iter-many Monte Carlo sample pairs that produce a U1 value equal to u1.

BayesMW1Sm(n1, n2, u1lab, iter, alpha, lprec, uprec): equivalent to BayesMWSm for any arrays R1 and R2 which have n1 and n2 entries respectively and has a U1 value equal to u1.

BayesMW1Lg(n1, n2, u1, u2, lab, a, b, alpha): returns an array with the posterior Beta parameters, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X) and P(H1|X) for a Bayesian Mann-Whitney test based on the null hypothesis θ1 < .5. The output is based on a Beta distribution approximation using a Beta prior with parameters a (default 1) and (default 1).

See Bayesian Mann-Whitney Support for more details.

Bayesian Contrasts

CONTRAST(R1, R2): outputs an array with two columns and the same number of rows as in R1. The first column corresponds to the positive contrast coefficients in R2 and the second column corresponds to the negative contrast coefficients. 

R2 is a contrast column array containing as many entries as columns in the data array R1. The entries in R2 must add up to zero. This function can be used with the Bayesian Signed Ranks Test.

CONTRAST1(R1, R2): outputs an array with two columns. The first column corresponds to the +1 contrast coefficients in R2 and the second column corresponds to the -1 contrast coefficients.

R2 is a “contrast” column array containing as many entries as columns in R1. Entries are restricted to 1, -1 or 0 (or blank). The entries don’t need to sum to 0. This function can be used with the Bayesian Mann-Whitney Test.

CONTRAST2(R1,R2, nrows): outputs an array with two columns. The first column corresponds to the +1 contrast coefficients in R2 and the second column corresponds to the -1 contrast coefficients.

R1 is an m × n array in Excel format used for two-factor ANOVA without headings where each row factor contains nrows rows. R2 is a k × n “contrast” array where k = m/nrows. Entries are restricted to 1, -1 or 0 (or blank). The entries don’t need to sum to 0. This function can be used with the Bayesian Mann-Whitney Test.

See Bayesian Contrasts for details.

Mixed Model Contrasts

For the following functions, R1 contains data in the Excel format used for two-factor ANOVA without headings. The columns are used for a within-subjects factor and rows are used for a between-subjects factor.

CONTRAST3(R1, Rr, Rc, nrows): outputs an array with two columns. The first column corresponds to the positive contrast coefficients in Rr and the second column corresponds to the negative contrast coefficients in Rr. Rc is used to create a weighted sum of the column values in R1.

If R1 is an m × n array and each row factor contains nrows rows, then Rr is a k × 1 contrast array where k = m/nrows. Entries are restricted to 1, -1 or 0 (or blank). The entries don’t need to sum to 0.

Rc is a n × 1 “contrast” array whose entries are numeric weights. There are no restrictions (although usually only non-negative values are used). 

CONTRAST4(R1, Rr, Rc, nrows): outputs an array with two columns. The first column corresponds to the positive contrast coefficients in Rc and the second column corresponds to the negative contrast coefficients in Rc. Rr is used to determine which rows from R1 are included in the analysis.

If R1 is an m × n array and each row factor contains nrows rows, then Rr is a k × 1 “contrast” array where k = m/nrows. Entries are restricted to 1 or 0 (or blank). 

Rc is a n × 1 contrast array whose entries need to sum to 0. 

See Bayesian Mixed Model Contrasts for details.

Bayesian Independence Test

BayesIndep(R1, stype, lab, R2): returns an array with the values BF01, BF10, P(H1|X) for an m × n contingency table (w/o labels) in R1 based on priors in the m × n array R2. R2 can be replaced a single numeric value a (default 1).

stype takes the values “p”, “j”, “r”, “c” or “h” (default “p”) indicating the sampling approach; Poisson, Joint multinomial, Independent multinomial with fixed row totals, Independent multinomial with fixed column totals, or Hypergeometric, respectively. The “h” option can only be used when R1 contains a 2 × 2 contingency table.

DFunc(R1) = the Dirichlet function on the data in array R1.

See Bayesian Independent Testing Tools for details.

Bayesian Correlation Test

BayesCorrel(r, n, lab, iter): returns an array with the values BF01BF10P(H0|X), and P(H1|X) for a Bayesian one sample correlation test where r = the sample correlation coefficient based on two samples of size n. iter = the number of iterations used in calculating the integral (default 10000).

See Bayesian Correlation Testing for details.

Bayesian Kendall’s Tau Test

BayesTau(Rx, Ry, lab): return a column array with the values nc, nd, tau, phi.

BayesKendall(Rx, Ry, lab, a, b, alpha, hyp): return a column array with the values: nc, nd, tau, phi, followed by the same values returned by Bayes1Kendall. hyp is the test statistic for tau (default 0).

Bayes1Kendall(a, b, lab, alpha, hyp): return a column array with the values: posterior beta distribution parameters, mean, median, mode, 1-alpha HDI lower/upper, 1-alpha equal-tailed CI lower/upper, and hypothesis test BF01, BF10, P(H0|X) and P(H1|X). hyp is the test statistic for tau (default 0).

See Bayesian Kendall’s Tau Test for details.

Bayesian Gamma Test

TABLE2RAW(R1, nolabs): takes an m × n contingency table in R1 and outputs an equivalent k × 2 array with labels where k = the sums of the values in the contingency table. If nolabs = FALSE (default TRUE) then R1 also contains row and column labels (but not row/column totals); otherwise default labels 1, 2, … are used.

RAW2TABLE(R1): takes a two column array R1 with labels and outputs an equivalent contingency table with row and column headings. This the inverse of the existing TABLE2RAW worksheet function.

BayesGamma(R1, lab, a, b, alpha): returns an array like that shown in Figure 2 where a and b are the beta prior parameters (default 1).

See Bayes Gamma Test for details.

Bayes Sign and Median Tests

BayesSign(R1, R2, lab, a, b, alpha): returns an array with the same values as in BayesBeta for a sign test based on the paired data in R1 and R2, and a beta prior with parameters a and b (both defaulting to 1).

BayesMedian(R1, R2, lab, a, b, alpha): returns an array with the same values as in BayesBeta for a Median test based on the independent data in R1 and R2, and a beta prior with parameters a and b (both defaulting to 1).

Credible intervals are based on alpha (default .05). If lab = TRUE (default FALSE), a column of labels is appended to the output.

See More Bayesian Non-parametric Tests for details.

Miscellaneous

ESS(R1,  cutoff) = the effective sample size for the sample in column array R1 produced by the Metropolis algorithm; the ACF values are summed while ACF(k) < cutoff  (default .05)

See Effective Sample Size for Metropolis Algorithm for details.

SAMPLE_HDI(R1, lab, alpha): returns a column array with the endpoints of the 1–alpha HDI based on the data in column array R1

See Random Walk Metropolis Algorithm for details.

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B. (2014) Bayesian data analysis, 3rd Ed. CRC Press
https://statisticalsupportandresearch.files.wordpress.com/2017/11/bayesian_data_analysis.pdf

Marin, J-M and Robert, C. R.  (2014) Bayesian essentials with R. 2nd Ed. Springer
https://www.springer.com/gp/book/9781461486862

Jordan, M. (2010) Bayesian modeling and inference. Course notes
https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/index.html

Lee, P. M. (2012) Bayesian statistics an introduction. 4th Ed. Wiley
https://www.wiley.com/en-us/Bayesian+Statistics%3A+An+Introduction%2C+4th+Edition-p-9781118332573

Leave a Comment