Logistic Regression using Newton’s Method Detailed

Maximizing the log-likelihood statistic

Property 1: The maximum of the log-likelihood statistic occurs when

image2186

Proof: Let

Log-likelihood statistic

where the yi are considered constants from the sample and the pi are defined as follows:

image3511

Here

which is the odds ratio (see Definition 3 of Basic Concepts of Logistic Regression). Now let

image7134

To make our notation simpler we will define xi0 = 1 for all i, and so we have

image7113

Thus

image7114

Also, note that

image7121

The maximum value of ln L occurs where the partial derivatives are equal to 0. We first note that

image7123 image7124

Thus

image7125

image7126

The maximum of ln L occurs when

image7127

for all j, completing the proof.

Newton’s method for logistic regression

To find the values of the coefficients bi we need to solve the equations of Property 1.

We do this iteratively using Newton’s method (see Definition 2 and Property 2 of Newton’s Method), as described in the following property.

Property 2: Let B = [bj] be the (k+1) × 1 column vector of logistic regression coefficients, let Y = [yi] be the n × 1 column vector of observed outcomes of the dependent variable, let X be the × (k+1) design matrix, let P = [pi] be the n × 1 column vector of predicted values of success and V = [vi] be the n × n matrix where vi = pi(1–pi). Then if B0 is an initial guess of B and for all m we define the following iteration

Logistic regression iteration Newton

then for m sufficiently large, Bm+1 ≈ Bmand so Bm is a reasonable estimate of the coefficient vector.

Proof: Define

where xi0 = 1. We now calculate the partial derivatives of the fj.

Let vi = pi(1–pi) and using the terminology of Definition 2 of Newton’s Method, define

Now

where X is the design matrix (see Definition 3 of Multiple Regression Least Squares),  Y is the column matrix with elements yi and P is the column matrix with elements pi. Let V = the diagonal matrix with the elements vi on the main diagonal. Then

image7132

We can now use Newton’s method to find B, namely define the k × 1 column vectors Pm and Bm and the (k+1) × (k+1) square matrices Vm and Jm as follows based on the values of P, F, V, and J described above.

image8022

image8023

image7133

Then for sufficiently large m, F(Bm) = 0, which is equivalent to the statement of the property.

Reference

Shalizi, C. (2009) Logistic regression and Newton’s methodData mining class
https://www.stat.cmu.edu/~cshalizi/350/lectures/26/lecture-26.pdf

10 thoughts on “Logistic Regression using Newton’s Method Detailed”

  1. In your proof of Property 2, is there a reason you reversed the position of y and p in the (p-y)–rather than (y-p)–expression? Secondly, there appears to be a missing (second) “=” in the df/db partial expression that you show in the proof of Property 2.

    Reply
    • Hello Ed,
      Thanks for pointing out the errors and inconsistencies on this webpage.
      I just updated this webpage. I did it very quickly, and so hopefully, I haven’t made a mistake.
      Charles

      Reply
  2. Hello
    in your sentence : ” We can now use Newton’s method to find B, namely define the k × 1 column vectors Pm and Bm and the (k+1) × (k+1) square matrices Vm and Jm as follows based on the values of P, F, V and J described above.”, the Vm matrix should be of size (n,n)?

    Reply
  3. Charles,
    I would like to bring your attention on the following three points in the Property 1 proof:
    1. Exp(-b0-sum(bjxij)) should be (1-pi)/pi instead of pi/(1-pi).
    2. Exp(zi) should also be (1-pi)/pi instead of pi/(1-pi).
    3. For the last sentence of the proof, “ln L” is missing between “The maximum of” and “occurs”.

    -Sun

    Reply

Leave a Comment