Negative Binomial Reg & Solver| Real Statistics Using Excel

Basic Concepts

Using the α = 1/ν, μ parametrization described in Negative Binomial Regression, the pdf of the negative binomial distribution takes the form

We now consider the regression model

which is the same model as that used for Poisson regression, except that now we assume that the y_i follow a negative binomial distribution with parameters α = 1/ν, μ_i = μt_i.

Thus, for a sample with count data y₁, …, y_n, the likelihood function is

The corresponding log-likelihood is therefore

Example

Example 1: Create a negative binomial regression model for the titanic survival data in Figure 1. Here Age = 0 for children, Age = 1 for adults. Sex = 0 for females, Sex = 1 for Males. The titanic had three classes; this is reflected as a categorical variable. For each of the 12 combinations of Age, Sex, and Class, the number of passengers that survived is shown in column B, while the total number of such passengers is shown in column C. This example is taken from Hilbe (2007).

Figure 1 – Titanic data

Using Solver

We now show how to build the regression model using Solver. We first establish an initial guess for the regression coefficients, as shown in range J2:J7 of Figure 2.

Figure 2 – Initial Solver values

We initialize ln α to -.5, and so α = EXP(-.5) = .606531 and ν = 1/α = 1.648721.

Actually, we chose to initialize ln α instead of α to keep Solver from trying to calculate ln 0.

We next calculate LL using the formula =N14+O14-P14, as shown in cell J11. Here the values in row 14 are the sums of the values in the corresponding column. E.g. cell M2 contains the formula =C2*EXP(J$2+MMULT(D2:G2,J$3:J$6)), cell N2 contains =GAMMALN(B2+J$9)-GAMMALN(J$9)-GAMMALN(B2+1), cell O2 contains =B2*LN(J$7*M2), and cell P2 contains =(B2+J$9)*LN(1+J$7*M2).

We now call Solver (via Data > Analyze|Solver) and fill in the dialog box that appears as shown in Figure 3 in order to find which coefficient values maximize LL.

Figure 3 – Solver dialog box

After clicking on the Solve button, the values in Figure 2 change to those shown in Figure 4.

Figure 4 – Coefficients that maximize LL

The regression coefficients that maximize LL are shown in range J2:J6 with the alpha value shown in cell J9.

Covariance Matrix and Standard Errors

Define the k+1 × k+1 matrix V = [v_jh] where

V^-1 provides us with the covariance matrix for the coefficients. For the regression coefficients, the covariance matrix is

where X is the n × k design matrix (whose rows are the X_i) and Z is the n × k matrix [z_ij] where z_ij = μ_i /(1+ αμ_i) · x_ij. It then follows that the square root of the elements on the main diagonal of S contains the standard error of each of the regression coefficients.

The standard error of alpha is the square root of the reciprocal of v_k_+1,k+1 shown above.

Example 2: Determine the standard errors of the coefficients from Example 1.

The covariance matrix is shown in Figure 5.

Figure 5 – Covariance matrix

Most of these cells are duplicated from Figure 4. Some new entries are calculated by the formula =M2/(1+I$9*M2) in cell N2,

=MINVERSE(MMULT(TRANSPOSE(N2:N13*B2:F13),B2:F13))

in range P2:T6, and =SQRT(DIAG(P2:T7)) in range J2:J6.

The standard error of alpha is calculated in Figure 6. Here, cell D2 contains =LN(1+C2*C$15), cell E2 contains =SUMPRODUCT(1/SEQ(B2,1,C$16)), cell F2 contains =C$16^4*(D2-E2)^2, cell G2 contains =C2/(C$15^2*(1+C$15*C2)), cell F14 contains =SUM(F2:F13), and cell F16 contains =1/SQRT(F14+G14). See Basic Concepts of Matrices for a description of the DIAG and SEQ functions.