Significance Testing of the Logistic Regression Coefficients

Wald statistic

Definition 1: For any logistic regression coefficient b whose standard error is seb, the Wald statistic is given by the formula

Wald = b/seb

Since the Wald statistic is approximately normal, by Property 3 of Chi-Square Distribution, Wald2 is approximately chi-square, and, in fact, Wald2 ~ χ2(df). Here df = k – k0 where k = the number of parameters (i.e. the number of coefficients) in the full model and k0 = the number of parameters in a reduced model (i.e. the model with some variables removed). In testing whether coordinate b = 0, it follows that df = k – (k–1) = 1 , and so the test is Wald2 ~ χ2(1).

Key Property

Property 1: The covariance matrix S for the coefficient matrix B is given by the matrix formula

Logistic regression covariance matrix

where X is the n × (k+1) design matrix (as described in Definition 1 of Multiple Regression using Matrices)

Design matrix

and V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi), where ni = the number of observations in group i and pi = the probability of success predicted by the model for elements in group i. Groups correspond to the rows of matrix X and consist of the various combinations of values of the independent variables.

Note that S = (XTW)-1 where W is X with each element in the ith row of X multiplied by vii.

The standard errors of the logistic regression coefficients consist of the square root of the entries on the diagonal of the covariance matrix in Property 1.

Example

Example 1 (Coefficients): We now turn our attention to the coefficient table given in range E18:L20 of Figure 6 of Finding Logistic Regression Coefficients using Solver (repeated in Figure 1 below).

Logistic Regression Solver output

Figure 1 – Output from Logistic Regression tool

Using Property 1 we calculate the covariance matrix S (range V6:W7) for the coefficient matrix B via the formula

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)), MMULT(DIAGONAL(J6:J15*(1-J6:J15)*H6:H15),DESIGN(E6:E15))))

Actually, for computational reasons it is better to use the following equivalent array formula:

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)),J6:J15*(1-J6:J15)*H6:H15* DESIGN(E6:E15)))

The formulas used to calculate the values for the Rems coefficient (row 20) are given in Figure 2.

Logistic regression coefficient formulas

Figure 2 – Formulas for Logistic Regression coefficients

Conclusion

Note that Wald represents the Wald2 statistic and that lower and upper represent the 100-α/2 confidence interval of exp(b). Since 1 = exp(0) is not in the confidence interval (.991743, .993871), the Rem coefficient b is significantly different from 0 and should therefore be retained in the model.

% Correct

The % Correct statistic (cell N16 of Figure 1) is another way to gauge the fit of the model to the observed data. The statistic says that 76.8% of the observed cases are predicted accurately by the model. This statistic is calculated as follows:

For any observed values of the independent variables, when the predicted value of p is greater than or equal to .5 (viewed as predicting success) then the % correct is equal to the value of the observed number of successes divided by the total number of observations (for those values of the independent variables). When p < .5 (viewed as predicting failure) then the % correct is equal to the value of the observed number of successes divided by the total number of observations. These values are weighted by the number of observations of that type and then summed to provide the % correct statistic for all the data.

For example, for the case where Rem = 450, p-Pred = .774 (cell J10), which predicts success (i.e. survival). Thus the % Correct for Rem = 450 is 85/108 = 78.7% (cell N10). The weighted sum (cell N16) of all these cells is then calculated by the formula =SUMPRODUCT(N6:N15,H6:H15)/H16.

Finally, note that we can adjust the cutoff for success/failure from .5, as described above, to some other value between 0 and 1.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Christensen, R. (2013) Logistic regression: predicting counts.
http://stat.unm.edu/~fletcher/SUPER/chap21.pdf

Wikipedia (2012) Logistic regression
https://en.wikipedia.org/wiki/Logistic_regression

Agresti, A. (2013) Categorical data analysis, 3rd Ed. Wiley.
https://mybiostats.files.wordpress.com/2015/03/3rd-ed-alan_agresti_categorical_data_analysis.pdf

78 thoughts on “Significance Testing of the Logistic Regression Coefficients”

  1. Dear Charles,

    Thank you so much for your effort to show how to do significance testing of logistic reg. coeff. in Excel. This has helped me so much!!!

    Reply
  2. Hi!

    I would like to thank you for brilliant Idea to show mashine learning concepts in Excel. Math formulas may be difficult for interpretation but Excel formulas are very clear for people with financial background. You are genius!

    Reply
  3. Greetings,

    1. Should the variable r in property 1 be an n, as in the design matrix’s indeces (or vice-versa)?

    2. You mention the design matrix being “described in Definition 3 of Least Squares Method for Multiple Regression” but I couldn’t find Definition 3 or any mention of a design matrix on that page. Did I miss it?

    Thanks for the great post!

    Reply
    • Hello Quinn,
      Yes, you are correct on both issues.
      1. It should be n.
      2. The reference is incorrect.
      I have just corrected the webpage. Thank you very much for bringing these errors to my attention.
      I appreciate your help in improving the quality of the website.
      Charles

      Reply
  4. Hi Charles, I appreciate the resources of this page but had a question with the logistic regression model I created using your tool. Each time I enter my inputs it appears that my ROC curve is coming out as a 45 degree angle. Additionally, my probability of success rate predictions are very high as well with none below .6. That being said, I have an unbalanced success rate with a ratio of about 4:1. I tried manipulating the classification cutoff but I am unsure of what to make it due to the irregularity of the ROC curve. I was hoping you could provide some insight into these abnormalities.

    Reply
    • Pat,
      That the ROC is coming out at a 45-degree angle indicates that the model is probably not telling you anything useful for predictions.
      If you email me an Excel file with your data and logistic regression results, I will try to give you a more complete answer.
      Charles

      Reply
  5. Hi Charles,

    Appreciate the extensive work you’ve carried out to put this all together.

    Question – are you able to elaborate on why the distribution of coefficients is considered normal? I ask because significant analysis that follows to determine risk factor appropriateness using p-values necessitates this.

    Thanks,
    Sukh

    Reply
  6. Charles,
    Regarding how to calculate the %Correct when the predicted values of p <0.5, it is stated that "When p =0.5? Is the total number of observations the same as the total number of observations for those values for the independent variables or the grand total? Please clarify. In either way, I was not able to reproduce the same number as shown in Figure 1.

    -Sun

    Reply
    • Hello Sun,
      Can you email me the Excel spreadsheet with the results you got for Figure 1? I can then use this to see whether there is an error in Figure 1 or in your spreadsheet.
      Charles

      Reply
    • Hello Sun,
      I am sorry, but I don’t understand your comment.
      Are you saying that you are unable to reproduce the same numbers shown in Figure 1 when p = .5?
      Charles

      Reply
  7. Charles,
    There is a typo in the statement under Figure 1. Instead of stating “the correlation matrix S”, it should be stated as “the covariance matrix S”.

    Thanks,
    -Sun

    Reply
  8. Hi Charles, first of all i want to say thanks to you for your fantastic tool.

    I’ve estimated a logistic regression model with one additional independent variable, so i have 3 coeffiecients; 2 + intercept.

    The thing is that this model is better than the one with 1 independent variable, in terms of R-square and RMSE, but its coefficients seem to be statistically insignificant since they have p-values that are larger than 0,05. How can i interpret that?

    Thanks again, and greetings from Chile.

    Reply
    • Hi Mauro and greetings from Italy.
      The more independent variables in the model, the higher the R-Square value. This is true whether or not any of the independent variables are significant or not.
      Charles

      Reply
  9. Im trying to do this with just excel, I have managed to create the design results as indicated with “one” column but I can’t get the SE for Coeff B to calculate, when I do SQRT(v7+w8) Instead of DIAG command) I get 0.334 which is the SE of the intercept not the coefficient, what am I doing wrong?
    thanks

    Reply
    • Hi Chris,
      DIAG is an array function and so its output can cover a range with more than one cell.
      Since I don’t exactly understand what you did, it is difficult for me to comment further, but it is probably a good sign that you got the se for the intercept. It is likely that you need to use additional formulas to calculate the se of the other coefficients.
      Charles

      Reply
      • So adding values from V7 and W8 and then SQRT them gave me SE of the intercept

        What does the DIAG function actually do with the 2×2 matrix V7-W8 that when you take the SQRT of this array function it gives you the SE of the Coefficient?,
        I thought the DIAG function gave you a single column of the diagonal values that are V7 and W8 and ignored the rest of the matrix.
        thanks for your help

        Reply
        • Hi Chris,
          Yes, that is correct. If you place the array formula =DIAG(V7:W8) in Z7:Z8 and press Ctrl-Shft-Enter then cell Z7 will contain the value in V7 and cell Z8 will contain the value in W8. if instead you place the array formula =SQRT(DIAG(V7:W8)) in Z7:Z8 and press Ctrl-Shft-Enter then cell Z7 will contain the square root of the value in V7 and cell Z8 will contain the square root of the value in W8.
          Charles

          Reply
  10. Dear Charles,

    You wrote: “Wald^2 ~ χ2(df) where df = k – k0 and k = the number of parameters (i.e. the number of coefficients) in the model (the full model) and k0 = the number of parameters in a reduced model”

    Wald ~ N(0,1) so I would expect Wald^2 ~ χ2(1), since there is only one square of a normal variable.

    Shouldn’t DF=1?

    Thanks,
    O

    Reply
    • Dear Oren,
      I don’t have a definitive answer for you, but the issue is related to the fact that we are estimating the residual variance from the sample variance. This is why we use a t distribution with a df not necessarily equal to 1 (instead of a normal distribution) to test a regression coefficient in linear regression. See the following webpage for more details: https://stats.stackexchange.com/questions/60074/wald-test-for-logistic-regression
      Charles

      Reply
      • Thanks Charles 🙂

        Yes it is not clear …per the link you mention, it should be Z in logistic regression.
        I can see that R (at least the package I tried) is using Z, while SPSS is using Chi-square with df not necessarily equal to 1

        from the Link:
        “in logistic (and poisson) regression, the variance of the residuals is related to the mean…
        E(Y)=np and the variance is Var(Y)=np(1−p) so the variance and the mean are related.
        In logistic and poisson regression but not in regression with gaussian errors, we know the expected variance and don’t have to estimate it separately.
        …If you perform a logistic regression, the Wald statistics will be the z-value.”

        Thanks Oren

        Reply
  11. Hi Charles,

    What would the groups of inputs look like when there are multiple? One variable (rems in your example) makes sense but how does it change if you were to add, say, age to the equation? Would every combination of (rems, age) in the dataset get its own group? What about discrete variables such as if age were measured to the day and no two people had the exact same age?

    Reply
  12. Hi Charles,

    Thanks so much for you work! It’s very helpful.

    I was just wondering about the design function (DESIGN(E6:E15)) for the correlation matrix (which you addressed in a previous comment). I went to https://real-statistics.com/multiple-regression/least-squares-method-multiple-regression/, but was unable to find anything addressing it. The function doesn’t exist in Excel 2013. Is it an Add-In, because I couldn’t find it there either.

    Is there a work around?

    Thanks!

    Emily

    Reply
    • Emily,
      DESIGN is not a standard Excel function, but it is a function supplied in the Real Statistics addin. If you have installed the Real Statistics addin, you should be able to use the function just like any standard Excel function.
      If X is a numeric array, then DESIGN(X) is simply the array X with a column of ones appended to it as the first column.
      Charles

      Reply
  13. Dear Sir,
    I am still confused, if I use real statistics, it will instantly found the overall output value. then what if without using real stats but using manual steps. I am confused in finding the standard error. I follow the formula based on real statistics, but can not be used in manual steps. such as DIAG, DESIGN. the word is not found in the excel formula. thank you

    Reply
  14. Hi,
    Please, how can we determine the magnitude of variation from Wald test of G*E analysis using Genstat software.
    Thank you

    Reply
  15. Hi Charles,

    I have use Multinomial Logistic Regression data analysis tool. The exp(b) of coeff int the report is > lower and <upper, but the p-value of coeff is 0.8455, so this coeff retained in the model or not?

    Thanks
    Anson

    Reply
      • Charles,

        Thanks for your quick reply.

        I have 50 columns of Independent Variables and each column have about 30000 data, only one column for the dependent variables.

        So how to calculate each the coeff is significantly different from zero and should therefore be retained in the model?

        Anson

        Reply
        • Anson,
          If p-value < alpha then the coefficient is significantly different from zero. Equivalently, if 1 is not in the confidence interval then the coefficient is significantly different from zero. Charles

          Reply
          • Charles,

            Thanks for reply.

            I have a question for using
            MLogitPred(R0, R1, r, iter) function.

            I have 20 col x 1001 row raw data with heading. Column A: ref no.
            Columns B to E: independent variables
            Columns F to T : dependent variables

            I have change the value “r” to predict and the result as below:
            E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,13), Result=0.04
            E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,0), Result=#VALUE
            E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,4), Result=#VALUE

            So is that mean I use 13 in the r is right?

            Anson

          • Anson,
            You saw your data is in raw format, but you also say that columns F to T contain the dependent variables. Both of these can’t be true. If the data is in raw format then there would only be one column for the dependent variables. I will therefor assume that your data is in summary format. Since columns F to T contain the dependent variables and there are 15 columns from F to T, this would mean that your dependent variables are numbered 0, 1, 2, …, 14. Thus r should be 14.
            Charles

      • Dear Sir,technically speaking,does it mean that,the predictors with p>0.05 are useless and we don have something to report,what if in cases when all predictors are insignificant?
        Very kindly

        Reply
        • Leo,
          Assuming that alpha = .05 is the correct significance level, then variables that have p > .05 are not making a significant contribution. But things are never this simple since for example when you have three such variables removing two of them may result in the third being significant.
          Charles

          Reply
  16. Hi,

    Thanks for creating this great tool and website.
    I ran a logistic regression using the tool on on Mac using the data set as described in the video (https://www.youtube.com/watch?v=EKRjDurXau0) . But I am not getting the p-value table (as can be seen in the screenshot in the webpage above) for all the coefficients to determine the significance of each independent variable. What can be done ?

    Reply
      • Hi Charles,

        Appreciate the quick response. My problem got solved.

        I had large no. of rows so I was not able to locate the table.

        thanks

        Regards
        Shashank

        Reply
  17. Hi Charles,
    I’m still having trouble understanding the meaning of the p value and statistical significance in logistic regression. I want to know how significant are the coefficients. What does it exactly mean that it is statistically significant? From my basic understanding if the p value is below the cutoff point, i.e<0.05 then the variable is statistically significant right? and also, Since there isn't a normal distribution in logistic regression how reliable is the p value?

    cheers,
    Matt

    Reply
  18. Hello

    A question about the Wald test. You can determine the Wald test for linear regression? The information I find is used for logistic regression. I want to rule out if I can use it for a linear regression. Gtacias

    Reply
  19. How can i reduce the p value of my intercept coefficient if the p value of all my other variables are satisfactorily low in logistic regtession .

    Reply
    • Sankit,
      Besides changing your data (e.g. via a transformation), I don’t know how to do this. I also don’t know why you would want to do this.
      Charles

      Reply
  20. Dear Charles,

    Alas I am a babe in the logit world and I hope you will be kind enough to point me in the right direction. I have access to a large dataset on student scores that have been previously standardised along the lines of mean 25, s.d. 5. I wish to perform some significance testing between certain groups of students and have struck on the idea that I could/should convert these scores to logit’s using the probability of achieving each ‘raw’ score – essentially treating them as z-scores – and then use these logits in place of the raw standardised scores and conduct z-tests. I am (if it isn’t already painfully obvious) too statistically underskilled to know whether I am committing an egregious blunder with such a plan, but the reference to Wald in your article makes me think that I probably am… Any suggestions on how I might better understand the issue would be very welcome.

    Reply
    • Ead,
      It is not clear to me what advantage (if any) you get by converting the scores to logit’s. Why can’t you simply use the raw scores?
      Charles

      Reply
  21. Dear Charles
    Thank you for your help. I have to run the variables temperature treatments on three groups of 10 plants. i would like to use Anova one-way for variance analysis.
    I would like to know if it is the right analysis when i use Anova repeated measures

    Reply
  22. Hi Charles,

    Fantastic website. Thank you, it’s been very helpful.

    I have a binomial logistic regression with 10 independent variables. HL test and R2 indicate the model is a poor fit. I’d like to try to improve the fit by removing variables that have low Wald scores and add in variable interactions. I saw the commentary on creating interactions for multi variable linear regression, but I am not sure if I can copy exactly, or if I need to make further adjustments due to logistic function.

    1.) Is there an easy way to add in interactions?
    2.) Is there a single best test to use to decide if the model is a good fit? — high AUC, use one of the R2’s, or the HL test?

    Thanks again!
    Marty

    Reply
    • Marty,

      1. You can add in interaction of independent variables in exactly the same way as you do for multiple linear regression. I have simply implemented this via x1 * x2, which is easy to do in Excel.

      2. There isn’t a simple answer to this question, although I wouldn’t rely too heavily on the HL value. High AUC and R2 are likely to be better indicators.

      Charles

      Reply
      • Thanks Charles!

        One more question, is the de facto R2 “floor” of a binomial logistic regression .50?

        If I didn’t use a model and just “guessed”, it seems like I’d have a 50/50 chance of predicting the actual outcome.

        Therefore, if my model yields an R2 of .56, does that mean that the model only offers an .06 improvement of what I would have been able to achieve using guesswork alone?

        Thanks again,
        Marty

        Reply
        • Marty,

          Your remarks are true for the % Correct statistic, but not for the R2 statistic.

          R2 is calculated in a completely different way, and your remarks are not true for R2.

          Charles

          Reply
  23. Hello Charles
    Thank you very much for the answer.
    I think I can understand a bit better how you did the covariance matrix.

    However, all my independent variables are continuous (no repetitions), so that, could be possible to form the groups in function of the probability (as for Hosmer lemeshow statistics)?.
    That means to group the cases that are predicted with probability <0.1, <0.2…..<1

    Thaks in advance

    Reply
    • No problem. Each summary data row will be equivalent to one raw data row. The reported Hosmer Lemeshow value won’t be quite since it is typically based on 10 summarized observations, but that is probably not so important.

      Charles

      Reply
  24. Hello Charles
    Could you define what is group i in “property 1”?
    I would love to know which parameters did you choose to build the covariance matrix.
    In my logistic regression model I only have 2 variables so I will do the covariance matrix by using covar functions.

    Thanks

    Reply
    • In this context each group consists of any combinations of values of the independent variables. So if you have independent variables Gender and Age and the raw data is

      M 30 1
      F 31 0
      M 30 0
      M 32 1
      F 32 0
      F 31 1
      F 30 1
      M 32 1

      There are 8 sample elements (rows), but some of them can be grouped together, namely the ones where the gender and age are the same. This yields the following summary data (a sort of frequency table). The summary contains 5 groups The 3rd column is a count of all the cases that have a 0 as the dependent variable and the 4th column is a count of all the cases that have a 1 as the dependent variable.

      M 30 1 1
      F 31 1 1
      M 32 0 2
      F 32 1 0
      F 30 0 1

      This is a silly example, but I hope it helps answer your question.

      Charles

      Reply
  25. Dear sir,

    I have done logistic regression for 20 independent variables for which all of them are categorical (0 and 1) also 1 binomial response variable. However, the significant test using p-value do not seems right with the variables.
    Does using all categorical variable as independent variable effects the result?

    Reply

Leave a Comment