Receiver Operating Characteristic (ROC) Curve

The ROC Curve is a plot of values of the False Positive Rate (FPR) versus the True Positive Rate (TPR) for all possible cutoff values from 0 to 1. See Logistic Regression Classification Table for further information.

Example

Example 1: Create the ROC curve for Example 1 of Comparing Logistic Regression Models.

The first portion of the analysis from Comparing Logistic Regression Models is shown here in Figure 1.

Base model

Figure 1 – Logistic regression data analysis

We begin by taking the observed values of success and failures in Logistic Regression summary format together with the calculated p-Pred values (i.e. columns H, I, and L from Figure 1) and sorting these by the p-Pred values. This can be done using Real Statistics’ SelectCols worksheet function  as follows (referring to Figure 1):

=SelectCols(H6:L17,”5,2,1”,3)

The result is shown in columns AI, AJ, and AK of Figure 2.

ROC curve logistic regression

Figure 2 – ROC Table and Curve

Next, create the cumulative values for Failure and Success (columns AL and AM) and then the values of FPR and TPR for each row (columns AN and AO). E.g. these entries for row 8 are calculated via the following formulas:

Logistic regression ROC formulas

Figure 3 – Selected formulas from Figure 2

The ROC curve can then be created by highlighting the range AN6:AO18 and selecting Insert > Charts|Scatter and adding the chart and axes titles. The result is shown on the right side of Figure 2. The actual ROC curve is a step function with the points shown in the figure.

Observation

The higher the ROC curve the better the fit. In fact, the area under the curve (AUC) can be used for this purpose. The closer AUC is to 1 (the maximum value) the better the fit. Values close to .5 show that the model’s ability to discriminate between success and failure is due to chance.

For Example 1, the AUC is simply the sum of the areas of each of the rectangles in the step function. The formula for calculating the area for the rectangle corresponding to the p-Pred value in row 8 (i.e. the formula in cell AP8) is shown in Figure 3. The formula for calculating the AUC for Example 1 (cell AP19) is =SUM(AP6:AP18). The calculated value of .845587 shows a pretty good fit.

Data Analysis Tool

The Real Statistics Logistic Regression data analysis tool automatically creates the ROC curve as described above.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2013) Receiver operating characteristic
https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Kaggle (2023) Classification table – confusion matrix. Logistic regression using R
https://www.kaggle.com/code/benroshan/part-8-logistic-regression-using-r

MedCalc (2023) ROC curves
https://www.medcalc.org/manual/roc-curves.php

50 thoughts on “Receiver Operating Characteristic (ROC) Curve”

  1. there would be the possibility of including machine learning models such as: naive bayes and support vector machines

    Reply
  2. Sir, I have got data and randomly split in to 70% and 30%. How should be ROC curve constructed? Means should the curve be constructed on 100% data or should it be constructed on 30% Data based on the probabilities obtained from the model developed using 70% of the data?
    Can u suggest me the methodology generally followed sir?

    Reply
    • Hello Sai,
      The answers to these questions are fairly similar to the answers I have provided to your previous questions. You can generate the ROC for all three possibilities.
      Charles

      Reply
      • Sir, I got auc value for 100% sample as 0.823. Where as for the predicted probability data on 30% data based on 70% data as 0.679. Both the accuracy of 70% data and 30% data was over 80%. Can I report accuracies of 70% data and 30% data as there can be error in shrinkage of 2% and AUC of overall data?

        Reply
        • You generate the ROC in the usual way. ROC only depends on the specific data you use. You need to decide which data set is most relevant; essentially this is based on why you want to generate the ROC in the first place.
          Charles

          Reply
          • Charles Sir, thank you for your valuable and timely suggestions with which I could complete my project. Thank you once again sir.

  3. Hello professor,
    just to let you know, the ROC curve data is not displayed using the Binary regression tool. I believe this is due to an issue with the SelectCols function as it doesn’t seem to produce any output (using Office 365 on Mac, version 16.35).

    BR

    Reply
    • Hello Filippo,
      Are you using Rel 7.0 or Rel 7.0.1. The latter was released yesterday and was intended to address this issue. I tested it on two binary logistic regression examples sent to me by other users who had a similar problem.
      If you are still having this problem when using the newer release, please email me an Excel file with your data and results and I will investigate the problem further.
      Charles

      Reply
      • Thank you for your prompt reply – the issue has been solved with the latest update.

        I also happened to come across your bio and learnt you live in Trento – just like me. Hope you stay safe in these difficult times!

        BR

        Reply
  4. Hi Charles,
    I have generated a ROC curve and AUC in graphpad prism as I have continuous data (rather than binary) in both a control and treatment group. From the output I have then calculated Youdens Index to ascertain the cut off point that differentiates between the two states, this being 0.56. I now need to back calculate what value (in terms of concentration) the index is referring to, to give a meaningful result. I have read elsewhere to use a logistic regression equation to do this but am not sure of what variables to include or whether this is the correct approach. Would you be able to shed some light on this please.

    The data that graphpad has generated includes; AUC, std error, 95% confidence and pvalue in addition to sensitivity, specificity and likelihood

    Here is the link by Ivm that refers to a regression equ. https://communities.sas.com/t5/SAS-Statistical-Procedures/ROC-in-SAS-obtaining-a-cut-off-value/td-p/161354

    Any assistance you are able to provide would be appreciated, thanks

    Reply
    • Hi Nikki,
      I am not that familiar with this topic. The following article may be helpful though:
      www4.stat.ncsu.edu/~bondell/OptimCut.pdf
      Charles

      Reply
      • Hello,

        This homepage helps me a lot! Thank a lot!
        But I have got a question because of the interpretation.
        If I have an auc value of 0.79. What does it mean? Let’s say I’ve got I risk assessment instrument which measures the risk of to be “red”… With an cut off of 10 my instrument would say that some one show the feature “red”.

        Now my auc value is 0.79,

        Does is it mean that if I choose randomly someone of the the “Red” population this person would have high scores on risk instrument? With a probability of 79 %?

        Or does it mean the relative probability of my risk assessment instrument to discriminate between “red” and “not red” ist 79 %?

        Or does it mean that my risk assessment instrument, can with a probility of 79% say that someone shows the feature “red”?

        Reply
  5. Hi
    I have conducted a study where an imaging modality was used diagnose a disease and that was compared to a gold standard laboratory test. I used a “truth table” (2×2) table to calculate sensitivity, specificity, PPV and NPV. I not sure which values or how to reorganise my data to be able to use the values to create a ROC curve on excel. Please help

    Reply
  6. Thanks for your good tutorial.
    I have a question:I’ve done some classification algorithms in weka software.Now I want to draw a ROC curve or PR curve according to weka results.How can I do it?

    Reply
    • I am not familiar with weka, but as long as it can output a plain text file you can use Excel’s Data > Get External Data | From Text capability to get the data into Excel.
      Charles

      Reply
  7. i dont know ,How to perform test set versus training set and its calculation.
    I have a set of data. I want to calculate true positive and false positive. some pls say

    Reply
  8. Hey Charles,

    Just a quick question. What are success and failure values? I have a set of data from which I calculated sensitivity and specificity. I have different cut-off values. I simply tried scatter plotting sen vs 1-spe but it did not create the ROC curve. Most of the points are clustered together. Am I doing something wrong?

    Thanks!

    Reply
    • It does sound like you have done something wrong, since the ROC curve should have a characteristic shape, but I would have to see your data to really be able to answer your question.
      Charles

      Reply
      • I am not sure how to post my data here but roughly, I have two groups (normal and positive). For each group, there are 80 samples. I calculated cutoff values for each ‘marker’, which I extracted from the raw data with my own methods. Then for each marker, I used the cutoff values to categorize each sample into either n (normal) or p (positive). For example, for one marker or cutoff value, I assigned n or p depending on whether the raw data value exceeds the cutoff or not. As you know, from this, I can calculate specificity and sensitivity.

        Then for each marker (since each marker has its own cutoff value) I graphed sen vs 1 – spe.

        Reply
  9. Hi,

    Congratulations, its a great blog!

    My question is: I have 2 classifiers (A/B).The classificador A had a better accuracy and B a better AUC. So, which should I use?

    Thank you so much!

    Lizzie

    Reply
  10. I have 3 methods to diagnose whether a segment of nerve is torn or not, I want to compare their diagnostic accuracies in correctly diagnose a torn nerve by comparing their ROC.

    Reply
    • Jimmy,
      I don’t current provide any means of comparing ROC’s in the software. Of course if the ROC are generated from my software they can be compared visually.
      Charles

      Reply
  11. It is a good overall on how to perform ROC
    Is there a way using Excel to compare diagnostic accuracies of three diagnostic methods using 3 ROC curves?

    Reply
    • Appreciate if you could answer my question. What if I am testing existing models i.e. logit/MDA (hence, using the same coefficient, plugged in the variables weight into the model), for that I only have the probability/scoring results. How should I perform/derive ROC from this results? Many thanks.

      Reply
  12. Hi ..this is good example for me.,I have a question hope u don’t mind to answer.
    If lets say, I have set of training data. Is ROC used to evaluate the threshold that I got fro this training data? If let say the ROC rate is 0.9, then this model of training and threshold data is good to test another data.?

    Thanks

    Reply
    • Raseeda,
      An AUC value of .9 is quite high and so I would think that the model would be good enough to test other values.
      Sorry that I haven’t responded sooner. I seemed to have overlooked your question until now.
      Charles

      Reply

Leave a Comment