Data Analysis Tool Introduction
The Real Statistics Resource Pack provides the Negative Binomial Regression data analysis tool to automate the process described in Constructing a Negative Binomial Regression Model using Solver.
To use the tool for Example 1 of that webpage, press Ctrl-m and select Negative Binomial Regression from the Reg tab on the menu that appears. Next fill in the dialog box that appears as shown in Figure 1 (with reference to the input data in Figure 1 of Constructing a Negative Binomial Regression Model using Solver).
Figure 1 – Negative Binomial Regression dialog box
After clicking on the OK button, the output shown in Figures 2, 3, 4, and 5 appears. Note that these figures are similar to the output from Poisson Regression, although we explain some differences below. We next each of the four output figures one at a time.
Coefficients
We start by referring to the output on the upper left-hand side of Figure 2.
Figure 2 – Negative Binomial Regression output (part 1)
The values in range J4:J9 are estimated by Solver to minimize the LL value in cell R14. In fact, cell R14 contains the formula
=NegBinomLL(D2:G13,B2:B13,J4:J9,C2:C13
The values in range K4:O9 of Figure 2 are calculated using the array formula
=NegBinomParam(D2:G13,B2:B13,J4:J9,C2:C13)
The values are the same as those produced in Figure 4 of Constructing a Negative Binomial Regression Model using Solver.
Covariance Matrix
The covariance matrix shown in range U4:Z9 of Figure 3 is calculated using the array formula
=NegBinomCov(D2:G13,B2:B13,J4:J9,C2:C13)
Note that the entries corresponding to regression coefficients are the same as in Figure 5. The variance for alpha is shown in cell Z9. The square root of this value is .00814, as shown in cell K9 of Figure 2.
Figure 3 – Negative Binomial Regression output (part 2)
The values in range AA4:AA9 are calculated by the formula
=MMULT(U4:Z9,NegBinomGrad(D2:G13,B2:B13,J4:J9,C2:C13))
Note that these values are all relatively small, reflecting that Solver converged to a solution.
Residuals
The residuals for each of the 12 data entries are shown in Figure 4. The entries are similar to those used for Poisson Regression.
Figure 4 – Negative Binomial Regression output (part 3)
Two types of residuals
The different types of residuals are as defined in Figure 1 of Poisson Regression Residuals and Goodness of Fit. The main difference is that we need to replace the variance μ for Poisson regression by μ(1 + αμ). Pearson’s residual is now calculated as
where ri is the raw residual. Thus, cell AF4 contains the formula
=AE4/SQRT(AD4*(1+AD4*J$9))
The deviance residual is calculated by
Taking care of the case where yi might be zero, we see that the formula in cell AG4 is
=SIGN(AE4)*SQRT(2*(IF(AC4=0,0,AC4*LN(AC4/AD4))-(AC4+1/J$9)*LN((1+J$9*AC4)/(1+J$9*AD4))))
In fact, if yi = 0 then
Hat matrix
The other difference is that the hat matrix is calculated using W instead of V where μ is replaced by μ/(1+αμ). In Figure 4, the elements in column AJ take the form
Thus, cell AJ4 contains the formula =AD4/(1+AD4*J$9) and range AK4:AK15 contains the array formula
=DIAG(MMULT(MMULT(AJ4:AJ15*DESIGN(D2:G13), MINVERSE(MMULT(TRANSPOSE(DESIGN(D2:G13)), AJ4:AJ15*DESIGN(D2:G13)))),TRANSPOSE(DESIGN(D2:G13))))
Goodness of Fit
As for Poisson regression, the residuals can be used to determine the overall fit of the negative binomial regression model.
Pearson’s statistic
The Pearson goodness of fit statistic (cell R4 of Figure 2) is equal to the sum of the squares of the Pearson residuals, i.e.
This can be calculated in Excel by the formula =SUMSQ(AF4:AF15). As for Poisson regression, we can test the goodness of fit, based on P ∼ χ2(n–k) when the null hypothesis that the regression model is a good fit is valid. As we can see in cell S4, p-value = CHISQ.DIST.RT(11.0715,12-5) = 0.1355 > .05, and so we have no reason to reject the goodness of fit of the negative binomial regression model for Example 1.
Deviance
The deviance goodness of fit statistic (cell R5 of Figure 2) is equal to the sum of squares of the deviance residuals, i.e.
We can calculate its value via the formula =SUMSQ(AG4:AG15) and again use a chi-square test to evaluate the model’s goodness of fit, as shown in cell S5.
R-square
The other values on the right side of Figure 2 are calculated as for Poisson regression. LLmax is calculated as for LLfit where μi is replaced by yi. Note too that
LLmin is calculated as for LLfit but where the intercept is the only regressor. The data analysis tool uses Solver to determine this value by finding the intercept coefficient and alpha value that maximize LL. This is done by first placing the formula
=NegBinomLL(D2:D13,B2:B13,J13:J15,C2:C13,TRUE)
in cell R13, and using only one of the regressors, namely the Age variable. Then when Solver is employed the coefficient of this variable is set to zero (via the Add constraint capability of Solver).
R-square, AIC, and BIC are calculated as for Poisson regression. Note that some software products use a different formula for R-square, namely 1–LLfit/LLmin instead of (LLfit–LLmin)/(LLmax–LLmin).
Predictions
The values predicted by the negative binomial regression model for the 12 original sample values is shown in Figure 5.
Figure 5 – Negative Binomial Regression output (part 4)
The formulas used are the same as those used for Poisson regression (based on the negative binomial regression coefficients and covariance matrix).
Worksheet Functions
Real Statistics Functions: The Real Statistics Resource Pack provides the following worksheet functions. These pertain to a Negative Binomial regression model based on X data in Rx (with k columns), Y data in Ry, coefficients in Rc, and frequency data in Rt. If Rt is omitted it defaults to a column of ones. Rc is a column array with k+2 rows, the first corresponds to the intercept and the last to the alpha value.
NegBinomLL(Rx, Ry, Rc, Rt, lalpha): returns the LL fitted value. If lalpha = FALSE (default), then last element in Rc is the alpha value, while if lalpha = TRUE, then the last element in Rc is the natural log of alpha.
NegBinomLL1(Ry, alpha): returns the LLmax value for the stated alpha value.
NegBinomCov(Rx, Ry, Rc, Rt): returns a k+2 × k+2 array containing the coefficient covariance matrix.
NegBinomGrad(Rx, Ry, Rc, Rt): returns a k+2 × 1 array containing the gradient of the regression coefficients
NegBinomParam(Rx, Ry, Rc, Rt, alpha, col1): if col1 = TRUE, returns a k+2 × 6 coefficient parameter array similar to range J4:O9 of Figure 2, while if col1 = FALSE (default), returns a k+2 × 5 coefficient parameter array without the first column similar to range K4:O9 of Figure 2.
Examples Workbook
Click here to download the Excel workbook with the examples described on this webpage.
References
Hilbe, J. M. (2007) Negative binomial regression. Cambridge University Press
https://nzdr.ru/data/media/biblio/kolxoz/M/MV/MVsa/Hilbe%20J.M.%20Negative%20Binomial%20Regression%20(CUP,%202007)(ISBN%209780521857727)(O)(263s)_MVsa_.pdf?ysclid=lkq9gjqlwg287891004
Hilbe, J. M. (2014) Modeling count data. Cambridge University Press
https://www.cambridge.org/core/books/modeling-count-data/BFEB3985905CA70523D9F98DA8E64D08
Hintze, J. L. (2007) Negative binomial regression. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Negative_Binomial_Regression.pdf
Zwilling, M. (2013) Negative binomial regression
https://www.researchgate.net/publication/270040724_Negative_Binomial_Regression