Negative Binomial Regression: Additional Insights

Objective

We now consider a negative binomial regression model where there is only one regressor (in addition to the intercept). Furthermore, exposure is equal to one. Our goal is to obtain additional insight about this way of modeling count data.

Example

Example 1: In a survey of 1308 people in which each person was asked how many homicide victims they know (1990 General Social Survey, National Opinion Research Center), the results obtained are summarized in Figure 1. Here, the only regressor is race (black or white).  The question under study was “does race help explain how many homicide victims a person knows?”

Homicide data

Figure 1 – Homicide data

For example, of the 159 blacks surveyed, 119 didn’t know any homicide victims but 12 of them knew two homicide victims.

We now insert the array formulas =Freq2RAW(A2:B8) in range G2:G1150 and Freq2RAW(D2:E8) in range H2:H160. Finally, using the dummy coding White = 0  and Black = 1, we combine columns G and H to form the data in range J1:K1309. The left side of Figure 2 shows the results (with most rows not displayed).

Homicide data reformatted

Figure 2 – Homicide data reformatted

The right side of Figure 2 shows some key statistics for the White and Black samples. Note that since the variance for either race is about twice the variance, we should expect over-dispersion from a Poisson regression model.

Regression Models

We now use the Real Statistics’ Poisson Regression and Negative Binomial Regression data analysis tools as described in Poisson Regression using Solver and Negative Binomial Regression Tool (Solver) to produce the models shown in Figure 3.

Count regression model coefficients

Figure 3 – Regression models

We see that in this example, the regression coefficients are the same. We also note that for either regression, the Black coefficient is highly significant, which means that there is a significant difference between the number of homicide victims that blacks know compared to whites. In fact, since EXP(1.733145) = 5.658419, the chances that a black person knows a homicide victim is almost 6 times that of a white person. Note too that =O3/N3 is also 5.658419.

Note that the predicted number of known victims for either model is EXP(-2.38321 + 1.733145 * Black). For blacks, where Black = 1, the predicted number is EXP(-2.38321 + 1.733145) = .522013, while for whites, where Black = 0, the predicted number is EXP(-2.38321) = .902254. These are the same as the mean values shown in cells O3 and N3 of Figure 2.

Note that the standard error for the Black coefficient in negative binomial regression model increases from .147 to .238. This indicates that a reduction in over-dispersion results in an increase in uncertainty.

Note too that the variance for a negative binomial regression model is mean*(1+alpha*mean), which for this example is .134 for whites and 1.87 for blacks. These values are fairly close to values from N4 and O4 in Figure 2, and better than estimates from Poisson regression.

Probabilities

Example 2: What is the probability that a black person has seen 4 homicides? What is the probability of 4 or more homicides?

As described in Negative Binomial Regression Predictions, the specific negative binomial distribution is based on the parameters

ν = 1/α    p = ν/( ν+μ)

Thus, for a black person Black = 1, and so as we observed above μ = .522013. Since α = 4.942862, ν =.202312, it follows that p = .202312/(.202312+.522013) = .279311, and so the probability that a black person has seen 4 homicides is

NEGBINOM.DIST(x, ν, p, FALSE) = NEGBINOM.DIST(4, .202312, .279311, FALSE)

Unfortunately, since ν = .202312 < 0, Excel’s NEGBINOM.DIST function can’t be used. Instead, as discussed in Negative Binomial Regression Predictions, we use the following formula

=EXP(GAMMALN(x+ν)-GAMMALN(x+1)-GAMMALN(ν)+ν*LN(p)+x*LN(1-p))

which yields a probability of .014897.

The probability of x or fewer doctors’ visits can be calculated by

=BETA.DIST(p, ν, x+1,TRUE)

Thus, the probability of 4 or more doctors’ visits is .040166, as calculated by

=1-BETA.DIST(p, .202312, 3+1,TRUE)

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Ford, C. (2024) Getting started with negative binomial regression modeling
https://library.virginia.edu/data/articles/getting-started-with-negative-binomial-regression-modeling

Agresti, A. (2007) An introduction to categorical data analysis, 2nd ed. Wiley
https://mregresion.wordpress.com/wp-content/uploads/2012/08/agresti-introduction-to-categorical-data.pdf

Leave a Comment