When the dependent variable is categorical it is often possible to show that the relationship between the dependent variable and the independent variables can be represented by using a logistic regression model. Using such a model, the value of the dependent variable can be predicted from the values of the independent variables.
We review here binary logistic regression models where the dependent variable only takes one of two values. In Multinomial and Ordinal Logistic Regression we look at multinomial and ordinal logistic regression models where the dependent variable can take two or more values.
We also review a model similar to logistic regression called probit regression.
Topics
- Basic Concepts
- Finding Coefficients using Excel’s Solver
- Significance Testing of Logistic Regression Coefficients
- Testing Fit of the Logistic Regression Model
- Finding Coefficients using Newton’s Method
- Handling Categorical Coding
- Comparing Logistic Regression Models
- Hosmer-Lemeshow Test
- Classification Table
- ROC Curve
- Real Statistics Logistic Regression Functions
- Additional Real Statistics Capabilities
- Logistic Regression Power and Sample Size
- Probit Regression
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Christensen, R. (2013) Logistic regression: predicting counts.
http://stat.unm.edu/~fletcher/SUPER/chap21.pdf
Wikipedia (2012) Logistic regression
https://en.wikipedia.org/wiki/Logistic_regression
Agresti, A. (2013) Categorical data analysis, 3rd Ed. Wiley.
https://mybiostats.files.wordpress.com/2015/03/3rd-ed-alan_agresti_categorical_data_analysis.pdf
Hello Dr., Congrats on a wonderful app and thanks for generosity in sharing it. I am teaching myself stats, so I don’t know much. But I am trying to do a logistic regression. I have a Y variable and 6 x variables. All are in binary variables (0, 1). When I click OK on logistic regression function it gives me an error message about run time error. It aborts and says Type mismatch. Can you help me know what I am doing wrong?
Hi Paul,
What do you see when you insert the formula =VER() in any cell?
If you email me a file with your Excel spreadsheet, I will try to figure what is causing the error?
Charles
Hi Charles,
I hope you are well. Thank you for the wonderful excel plug-in that you have designed. I find it really helpful. I was hoping you could help me with a query.
I’m trying to perform a logistic regression to identify the need for revision surgery in a group of patients that have had one surgery already. I have 4 independent variables of which 2 variables are numerical. The third is a categorical variable with 5 classes from 1 through 5. The fourth variable is a score derived by adding points assigned for various abnormalities identified on an X-ray.
My doubts are as below:
1. Is it ok to consider the 4th variable numerical ?
2. Can I use categorical variable by giving it numbers 1 through 5 in the logistic regression equation or does it have to be binary?
Hello Bhushan,
1. It sounds like the 4th variable is numerical, but I would have to understand better how you add points for the abnormities.
2. It sounds like your output is a categorical (dependent) variable that takes the value 1 if the patient needs revision surgery and 0 if not. You can use independent variables that are categorical that are not binary. You can also use independent variables that take Likert values (1 to 5), treating such ordinal variables simply as numerical variables.
Charles
Would there be the possibility of implementing an ordered and multinomial probit model?
Antony,
Yes. I will add this to the list of possible future enhancements.
Charles
thank you so much
Dr Zaiontz,
First off, I really do appreciate your development and free download on XREALSTATS. It has been wonderful!
I have a need to perform a pooled logistic regression (i.e., the independent variable values are time dependent). Will XREALSTATS handle this scenario and how would the variables be loaded into Excel?
Thanks,
Russ
Hell Russ,
If by pooled, you mean that you can ignore the assumption of independence of the observations (due to the time), then you can use ordinary logistic regression. If, as I expect, you are looking for panel analysis for logistic regression, then currently Real Statistics does nor support this scenario.
Charles
This is point blank the best explanation and introduction to the subject!
Thank you. I try my best.
Charles
Hi Dr Zaionitz
I had a question about sample size calculation for a clinical trial. My hypothesis is: “Colchicine lowers the recurrence rate of pericarditis in patients with SLE. “
The Y-variable is binary. There are multiple independent X-variables like age, bmi, sex, placebo or colchicine…
I am planning on using logistic regression to analyze, but I’m having issues finding the right formula for the sample size calculation.
Thank you
Do you have a recommendation?
Hi Sophie,
See https://www.real-statistics.com/logistic-regression/logistic-regression-sample-size/
Charles
Hello, I have a quick question regarding the logistic regression outputs.
Does the logistic regression automatically standardize my numeric observations, or would I have to firstly change my data using the STANDARDIZE function, and then produce the logistic regression?
Thanks,
Wadi Luca Watfa
Hello Wadi,
No, the Logistic Regression data analysis tool doesn’t automatically standardize numeric data.
Charles
Hi, Dr Zaionitz
After running the model on the training data (the one that have dependent variables) , how to use the results to predict the probability of the unlabeled data?
Thank you in advance
Hi Ray,
Use the LogitPRED or LogitPREDC function as described at
https://www.real-statistics.com/logistic-regression/real-statistics-functions-logistic-regression/
Charles
Ok Thak you Sr.
Hello Charles,
I really need your help, Can you help me explain about “Logistic and Probit regression” and “Multinomial logistic Regression” when we should use one of them.
For details, I have a dataset with only one Independent variables is quantitative variables and one dependent variables ( 0 and 1 values) for simple logistic regression.
Also, I have a dataset with quantitative, binary, nominal, ordinal variables for Independent variables and one dependent variables ( 0 and 1 values) for multiple logistic regression.
Can you please let me know what I should use and how I can use for both of them. I can’t find the docs for my problems. My english not really good I’m sorry about that.
Hello Yen Nguyen,
In either of these cases, since you have one dependent variable that only takes the values 0 or 1, logistic regression or Probit regression could be used.
Charles
Dear Dr Zaionitz, i am waiting that you and your family are ok. Excuseme how can i see the Hosmer Stattistics, at the Logistic Regression?, the 7.6 version had this statitistics, and this one not.
Thanks a lot.
Hello Gerardo,
We are all fine. I hope the same is true for you and your family. Covid has caused a lot of disruption and stress, but fortunately we are all fine.
I removed the Hosmer statistic from the logistic regression tool for two reasons:
1. The Hosmer statistic is not really that useful
2. The version of the statistic that was included in previous releases was only correct in very limited cases.
Charles
Thanks a llot.
Hello Charles,
I am unable to add in the realstat plugin. getting an error while doing so.
I am using 365. Can you please help.
Thanks
Ankit
Ankit,
What sort of error message are you getting? Are you using Windows or the Mac?
Charles
Dc, bs afternoon, I hope you are very well, along with everyone at home, doctor please, how can I calculate the size of a sample for diagnostic tests?
Do you have a recommended page?
Thank you very much
Hi Gerardo,
See https://www.real-statistics.com/logistic-regression/logistic-regression-sample-size/
Also, G*Power can be helpful.
Charles
I wanted to do a binary logistic regression however can only see an option for logistic and probit regression, can I use this test?
Ben,
Yes, that is the correct option.
Charles