Zero-Inflated Poisson Regression

Introduction

The Poisson Regression model assumes that the observed count data follows a Poisson distribution. One problem with this is that seldom does the assumption that the mean = variance hold. Often for this is because the data may contain a lot more zeros than is consistent with a Poisson distribution. We address this situation here.

Basic Concepts

Let’s suppose that π = the proportion of data elements that exceed the number of zeros predicted by a Poisson distribution with mean μ. Then

Probability of zero count

Probability of non-zero count

The mean and variance of this mixture model are

Mean

Variance

Thus, for data element i, we have

Probability of zero count

and for j > 0, we have

Probability of non-zero count

where

Formula for mu_i

Formula for pi_i

Formula for lambda_i

based on a mixture of Poisson regression and logistic regression models with coefficients β1, …, βk and γ1, …, γm, respectively. The regressors xij for the Poisson model don’t have to be the same as those for the logistic model (represented by zij).

Log-likelihood

We first note that

1-pi_i

Probability of zero

Since the likelihood function for a sample is

Likelihood function

using the above equalities, it follows that the log-likelihood is

Log-likelihood 1Log-likelihood 2

Log-likelihood 3

Topics

We use Solver to find the Poisson and logistic coefficients that maximize LL. Click on the following topics for more details.

References

Hilbe, J. M. (2014) Modeling count data. Cambridge University Press
https://assets.cambridge.org/97811070/28333/frontmatter/9781107028333_frontmatter.pdf

Hintze, J. L. (2007) Zero-Inflated Poisson regression. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Zero-Inflated_Poisson_Regression.pdf

Long, J. S. and Freese, J. (2001) Regression models for categorical dependent variables using Stata
http://investigadores.cide.edu/aparicio/data/refs/Long%26Freese_RegModelsUsingStata_2001.pdf

Leave a Comment