Negative Binomial Regression Analysis Tool (Solver option)

Data Analysis Tool Introduction

The Real Statistics Resource Pack provides the Negative Binomial Regression data analysis tool to automate the process described in Constructing a Negative Binomial Regression Model using Solver.

To use the tool for Example 1 of that webpage, press Ctrl-m and select Negative Binomial Regression from the Reg tab on the menu that appears. Next fill in the dialog box that appears as shown in Figure 1 (with reference to the input data in Figure 1 of Constructing a Negative Binomial Regression Model using Solver).

Figure 1 – Negative Binomial Regression dialog box

After clicking on the OK button, the output shown in Figures 2, 3, 4, and 5 appears. Note that these figures are similar to the output from Poisson Regression, although we explain some differences below. We next each of the four output figures one at a time.

Coefficients

We start by referring to the output on the upper left-hand side of Figure 2.

Negative binomial regression 1

Figure 2 – Negative Binomial Regression output (part 1)

The values in range J4:J9 are estimated by Solver to minimize the LL value in cell R14. In fact, cell R14 contains the formula

=NegBinomLL(D2:G13,B2:B13,J4:J9,C2:C13

The values in range K4:O9 of Figure 2 are calculated using the array formula

=NegBinomParam(D2:G13,B2:B13,J4:J9,C2:C13)

The values are the same as those produced in Figure 4 of Constructing a Negative Binomial Regression Model using Solver.

Covariance Matrix

The covariance matrix shown in range U4:Z9 of Figure 3 is calculated using the array formula

=NegBinomCov(D2:G13,B2:B13,J4:J9,C2:C13)

Note that the entries corresponding to regression coefficients are the same as in Figure 5. The variance for alpha is shown in cell Z9. The square root of this value is .00814, as shown in cell K9 of Figure 2.

Figure 3 – Negative Binomial Regression output (part 2)

The values in range AA4:AA9 are calculated by the formula

=MMULT(U4:Z9,NegBinomGrad(D2:G13,B2:B13,J4:J9,C2:C13))

Note that these values are all relatively small, reflecting that Solver converged to a solution.

Residuals

The residuals for each of the 12 data entries are shown in Figure 4. The entries are similar to those used for Poisson Regression.

Negative binomial regression 3

Figure 4 – Negative Binomial Regression output (part 3)

Two types of residuals

The different types of residuals are as defined in Figure 1 of Poisson Regression Residuals and Goodness of Fit. The main difference is that we need to replace the variance μ for Poisson regression by μ(1 + αμ). Pearson’s residual is now calculated as

Pearson's p_i

where ri is the raw residual. Thus, cell AF4 contains the formula

=AE4/SQRT(AD4*(1+AD4*J$9))

The deviance residual is calculated by

Deviation d_i

Taking care of the case where yi might be zero, we see that the formula in cell AG4 is

=SIGN(AE4)*SQRT(2*(IF(AC4=0,0,AC4*LN(AC4/AD4))-(AC4+1/J$9)*LN((1+J$9*AC4)/(1+J$9*AD4))))

In fact, if yi = 0 then

Deviance d_i if yi = 0

d_i if yi = 0

Hat matrix

The other difference is that the hat matrix is calculated using W instead of V where μ is replaced by μ/(1+αμ). In Figure 4, the elements in column AJ take the form

w_i formula

Thus, cell AJ4 contains the formula =AD4/(1+AD4*J$9) and range AK4:AK15 contains the array formula

=DIAG(MMULT(MMULT(AJ4:AJ15*DESIGN(D2:G13), MINVERSE(MMULT(TRANSPOSE(DESIGN(D2:G13)), AJ4:AJ15*DESIGN(D2:G13)))),TRANSPOSE(DESIGN(D2:G13))))

Goodness of Fit

As for Poisson regression, the residuals can be used to determine the overall fit of the negative binomial regression model.

Pearson’s statistic

The Pearson goodness of fit statistic (cell R4 of Figure 2) is equal to the sum of the squares of the Pearson residuals, i.e.

Pearson's deviation

This can be calculated in Excel by the formula =SUMSQ(AF4:AF15). As for Poisson regression, we can test the goodness of fit, based on P ∼ χ2(n–k) when the null hypothesis that the regression model is a good fit is valid. As we can see in cell S4, p-value = CHISQ.DIST.RT(11.0715,12-5) = 0.1355 > .05, and so we have no reason to reject the goodness of fit of the negative binomial regression model for Example 1.

Deviance

The deviance goodness of fit statistic (cell R5 of Figure 2) is equal to the sum of squares of the deviance residuals, i.e.

Deviation G-square

We can calculate its value via the formula =SUMSQ(AG4:AG15) and again use a chi-square test to evaluate the model’s goodness of fit, as shown in cell S5.

R-square

The other values on the right side of Figure 2 are calculated as for Poisson regression. LLmax is calculated as for LLfit where μi is replaced by yi. Note too that

LLmin is calculated as for LLfit but where the intercept is the only regressor. The data analysis tool uses Solver to determine this value by finding the intercept coefficient and alpha value that maximize LL. This is done by first placing the formula

=NegBinomLL(D2:D13,B2:B13,J13:J15,C2:C13,TRUE)

in cell R13, and using only one of the regressors, namely the Age variable. Then when Solver is employed the coefficient of this variable is set to zero (via the Add constraint capability of Solver).

R-square, AIC, and BIC are calculated as for Poisson regression. Note that some software products use a different formula for R-square, namely 1–LLfit/LLmin instead of (LLfitLLmin)/(LLmax–LLmin).  

Predictions

The values predicted by the negative binomial regression model for the 12 original sample values is shown in Figure 5.

Figure 5 – Negative Binomial Regression output (part 4)

The formulas used are the same as those used for Poisson regression (based on the negative binomial regression coefficients and covariance matrix).

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack provides the following worksheet functions.  These pertain to a Negative Binomial regression model based on X data in Rx (with k columns), Y data in Ry, coefficients in Rc, and frequency data in Rt. If Rt is omitted it defaults to a column of ones. Rc is a column array with k+2 rows, the first corresponds to the intercept and the last to the alpha value.

NegBinomLL(Rx, Ry, Rc, Rt, lalpha): returns the LL fitted value. If lalpha = FALSE (default), then last element in Rc is the alpha value, while if lalpha = TRUE, then the last element in Rc is the natural log of alpha.

NegBinomLL1(Ry, alpha): returns the LLmax value for the stated alpha value.

NegBinomCov(Rx, Ry, Rc, Rt): returns a k+2 × k+2 array containing the coefficient covariance matrix.

NegBinomGrad(Rx, Ry, Rc, Rt): returns a k+2 × 1 array containing the gradient of the regression coefficients

NegBinomParam(Rx, Ry, Rc, Rt, alpha, col1): if col1 = TRUE, returns a k+2 × 6 coefficient parameter array similar to range J4:O9 of Figure 2, while if col1 = FALSE (default), returns a k+2 × 5 coefficient parameter array without the first column similar to range K4:O9 of Figure 2.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Hilbe, J. M. (2007) Negative binomial regression. Cambridge University Press
https://nzdr.ru/data/media/biblio/kolxoz/M/MV/MVsa/Hilbe%20J.M.%20Negative%20Binomial%20Regression%20(CUP,%202007)(ISBN%209780521857727)(O)(263s)_MVsa_.pdf?ysclid=lkq9gjqlwg287891004

Hilbe, J. M. (2014) Modeling count data. Cambridge University Press
https://www.cambridge.org/core/books/modeling-count-data/BFEB3985905CA70523D9F98DA8E64D08

Hintze, J. L. (2007) Negative binomial regression. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Negative_Binomial_Regression.pdf

Zwilling, M. (2013) Negative binomial regression
https://www.researchgate.net/publication/270040724_Negative_Binomial_Regression

Leave a Comment