Creating a Negative Binomial Regression model using Solver

Basic Concepts

Using the α = 1/ν, μ parametrization described in Negative Binomial Regression, the pdf of the negative binomial distribution takes the form

Negative binomial distribution pdf

NB pdf version 2

We now consider the regression model

Negative binomial regression model

which is the same model as that used for Poisson regression, except that now we assume that the yi follow a negative binomial distribution with parameters α = 1/ν, μi = μti.

Thus, for a sample with count data y1, …, yn, the likelihood function is

Negative binomial likelihood

The corresponding log-likelihood is therefore

Negative binomial LL 1

Negative binomial LL 2

Example

Example 1: Create a negative binomial regression model for the titanic survival data in Figure 1. Here Age = 0 for children, Age = 1 for adults. Sex = 0 for females, Sex = 1 for Males. The titanic had three classes; this is reflected as a categorical variable. For each of the 12 combinations of Age, Sex, and Class, the number of passengers that survived is shown in column B, while the total number of such passengers is shown in column C. This example is taken from Hilbe (2007).

Titanic data

Figure 1 – Titanic data

Using Solver

We now show how to build the regression model using Solver. We first establish an initial guess for the regression coefficients, as shown in range J2:J7 of Figure 2.

Solver initialization

Figure 2 – Initial Solver values

We initialize ln α to -.5, and so α = EXP(-.5) = .606531 and ν = 1/α = 1.648721.

Actually, we chose to initialize ln α instead of α to keep Solver from trying to calculate ln 0.

We next calculate LL using the formula =N14+O14-P14, as shown in cell J11. Here the values in row 14 are the sums of the values in the corresponding column. E.g. cell M2 contains the formula =C2*EXP(J$2+MMULT(D2:G2,J$3:J$6)), cell N2 contains =GAMMALN(B2+J$9)-GAMMALN(J$9)-GAMMALN(B2+1), cell O2 contains =B2*LN(J$7*M2), and cell P2 contains =(B2+J$9)*LN(1+J$7*M2).

We now call Solver (via Data > Analyze|Solver) and fill in the dialog box that appears as shown in Figure 3 in order to find which coefficient values maximize LL.

Solver dialog box

Figure 3 – Solver dialog box

After clicking on the Solve button, the values in Figure 2 change to those shown in Figure 4.

Coefficients that maximize LL

Figure 4 – Coefficients that maximize LL

The regression coefficients that maximize LL are shown in range J2:J6 with the alpha value shown in cell J9.

Covariance Matrix and Standard Errors

Define the k+1 × k+1 matrix V = [vjh] where

v_jh formula

v_j,k+1 formula

v_k+1,k+1 formula

V-1 provides us with the covariance matrix for the coefficients. For the regression coefficients, the covariance matrix is

Covariance matrix

where X is the n × k design matrix (whose rows are the Xi) and Z is the n × k matrix [zij] where zij = μi /(1+ αμi) · xij. It then follows that the square root of the elements on the main diagonal of S contains the standard error of each of the regression coefficients.

The standard error of alpha is the square root of the reciprocal of vk+1,k+1  shown above.

Example 2: Determine the standard errors of the coefficients from Example 1.

The covariance matrix is shown in Figure 5.

Coefficient covariance matrix

Figure 5 – Covariance matrix

Most of these cells are duplicated from Figure 4. Some new entries are calculated by the formula =M2/(1+I$9*M2) in cell N2,

=MINVERSE(MMULT(TRANSPOSE(N2:N13*B2:F13),B2:F13))

in range P2:T6, and =SQRT(DIAG(P2:T7)) in range J2:J6.

The standard error of alpha is calculated in Figure 6. Here, cell D2 contains =LN(1+C2*C$15), cell E2 contains =SUMPRODUCT(1/SEQ(B2,1,C$16)), cell F2 contains =C$16^4*(D2-E2)^2, cell G2 contains =C2/(C$15^2*(1+C$15*C2)), cell F14 contains =SUM(F2:F13), and cell F16 contains =1/SQRT(F14+G14). See Basic Concepts of Matrices for a description of the DIAG and SEQ functions.

Coefficient standard errors

Figure 6 – Standard error for alpha

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Hilbe, J. M. (2007) Negative binomial regression. Cambridge University Press
https://nzdr.ru/data/media/biblio/kolxoz/M/MV/MVsa/Hilbe%20J.M.%20Negative%20Binomial%20Regression%20(CUP,%202007)(ISBN%209780521857727)(O)(263s)_MVsa_.pdf?ysclid=lkq9gjqlwg287891004

Hilbe, J. M. (2014) Modeling count data. Cambridge University Press
https://www.cambridge.org/core/books/modeling-count-data/BFEB3985905CA70523D9F98DA8E64D08

Hintze, J. L. (2007) Negative binomial regression. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Negative_Binomial_Regression.pdf

Leave a Comment