Property 1: The maximum of the log-likelihood statistic for Poisson Regression occurs when the following k equations hold
Proof: As usual, our goal is to look at the xij and yi values as fixed and determine the unknown βj which maximize LL. We do this by setting the partial derivatives of LL with respect to the μi to zero.
First, note that
Property 2: Let B = [bj] be the k × 1 column vector of logistic regression coefficients, let Y = [yi] be the n × 1 column vector of observed outcomes of the dependent variable, let X be the n × k design matrix (see Definition 3 of Least Squares Method for Multiple Regression), let P = [pi] be the n × 1 column vector of predicted values of success and V = [vij] be the n × n diagonal matrix where vij = zij on the main diagonal and zeros elsewhere. Then if B0 is an initial guess of B and for all m we define the following iteration
then for m sufficiently large B ≈ Bm, and so Bm is a reasonable estimate of the coefficient vector.
Proof: The proof follows from the following observation
References
Hintze, J. L. (2007) Poisson regression. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Poisson_Regression.pdf
Nussbaum, E. M., Elsadat, S., Khago, A. H. (2007) Best practices in evaluating count data, Chapter 21: Poisson regression.
http://www.academia.edu/438746
Penn State (2017) Poisson regression. STAT 504: Analysis of discrete data.
https://online.stat.psu.edu/stat504/lesson/9