ROC Curve

The Receiver Operating Characteristic (ROC) Curve is a plot of values of the False Positive Rate (FPR) versus the True Positive Rate (TPR) for a specified cutoff value.

Example

Example 1: Create the ROC curve for Example 1 of Classification Table.

We begin by creating the ROC table as shown on the left side of Figure 1 from the input data in range A5:C17.

ROC curve

Figure 1 – ROC Table and Curve

First, we create the cumulative values for Failure and Success (columns D and E) and then the values of FPR and TPR for each row (columns F and G). E.g. the entries for row 9 are calculated via the formulas in Figure 2:

Cell Meaning Formula
D9 Failure Cumulative =D8+B9
E9 Success Cumulative =E8+C9
F9 FPR =1-D9/D$17
G9 TPR =1-E9/E$17
H9 AUC =(F9-F10)*G9

Figure 2 – Selected formulas from Figure 1

The ROC curve can then be created by highlighting the range F7:G17, selecting Insert > Charts|Scatter, and adding the chart and axes titles (as described in Excel Charts). The result is shown on the right side of Figure 1. The actual ROC curve is a step function with the points shown in the figure.

Area Under the Curve

The higher the ROC curve (i.e. the closer to the line y = 1) the better the fit. In fact, the area under the curve (AUC) can be used for this purpose. The closer the AUC is to 1 (the maximum value) the better the fit. Values close to .5 show that the model’s ability to discriminate between success and failure is due to chance.

The AUC is simply the sum of the areas of each of the rectangles in the step function (see Figure 3). The formula for calculating the area for the rectangle corresponding to row 9 in Figure 1 (i.e. the formula in cell H9) is =(F9-F10)*G9 as shown in Figure 2. The formula for calculating the AUC (cell H18) is =SUM(H7:H17). The calculated value of .889515 shows a pretty good fit.

AUC

Figure 3 – Area under the curve

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Hanley J. A., McNeil B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve
https://pubmed.ncbi.nlm.nih.gov/7063747/

Hintze, J. L. (2008) ROC Curves. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/ROC_Curves-Old_Version.pdf

Hintze, J. L. (2022) One ROC curve and cutoff analysis. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One_ROC_Curve_and_Cutoff_Analysis.pdf

IBM (2011) ROC algorithms IBM SPSS Statistics 20 Algorithms
http://www.sussex.ac.uk/its/pdfs/SPSS_Algorithms_20.pdf

59 thoughts on “ROC Curve”

  1. Hi Charles,

    I have a question about using cumulative values in calculating FPR and TPR for the ROC curve.

    In your example of mosquitos, each cutoff value was evaluated in a separate experiment each of which had a different set of subjects, so that the cumulative number of successes + the cumulative number of failures = total number of subjects (mosquitos) in all of the experiments combined. Is it correct to use cumulative values in cases where you are repeatedly testing different cutoff values in the same subjects, say when looking at different cutoff values for a diagnostic test of a disease in 100 people, half of whom have the disease and half of whom do not? In that case the cumulative values would exceed the number of subjects. Since I know who has the disease and who does not, could I dispense with cumulative values and calculate FPR and TPR by simply counting the number of correct and incorrect diagnoses at each cutoff? The latter approach seems intuitively better, but my intuition and statistics don’t mix well.

    Thanks for your patience,

    Joe

    Reply
    • Hello Joe,
      I don’t completely understand your question. In particular, can you give me an example where the cumulative values would exceed the number of subjects? Perhaps what you are suggesting is useful, but is it the correct way to compute the ROC.
      Charles

      Reply
  2. Hi.
    I was trying to calculate AUC using your method but I get very odd results for curves with very few datapoints and realize that your calculations use the square that builds between the points. Is this really intended? I would say that the correct AUC for row 9 should be:
    =(F9-F10)*(G9-G10)/2+G10*(F9-F10)

    Please consider an extremly simple ROC with the following points:
    0, 0
    .5, .5
    1, 1

    This draws a perfect line through the middle of the ROC and should generate an AUC of 0.5 but your calculations make 2 squares, one consisting of 50% of the area and one with 25% of the area and add these 2 together, giving an AUC of .75.

    My calculations give the correct value of .5.

    Consider another example:
    0, 0
    .1, .5
    1, 1

    Your calculations give two squares with the areas of 90% of the total area and 5% of the total area, when in fact the actual area under the curve consist of two, non-square areas, with 67.5% an 2.5% of the area , giving an AUC of .7.

    Are your calculations correct and mine wrong? Why? Am I missing some fundamental understanding of how the AUC is calculated?

    Best regards
    Frans Vincent

    Reply
  3. I had been using the formula for calculating the rows of area under the curve at each cut off as described [=(F9-F10)*G9]. However, with some ROC curves, especially it seems where there are groups of results with identical FPR and/or TPR values and I’m calculating the TPR and FPR for very fine increments of cut offs, the sum of those rows over estimates the AUC compared to ROC curve analysis using R. I have modified the formula in H9 to be =(F9-F10)*Average(G9,G10) and then copied that down the column to give AUC values equivalent to those from R.

    Reply
  4. Thank you. That was really very helpful.
    May I ask please if I can calculate the p-value (level of significance) for the area under the curve. Just like stat software does such as SPSS. And how using excel?
    Many thank again ..

    Reply
  5. Can your program handle missing data on the predictor variable, and if so, how can I enter the missing data? Suppose I use multiple computation to address the issue of missing data, how can I use this data (enter or paste) in determining the ROC using your program?

    Reply
  6. Thanks for this example
    You did a frequency distribution for the Dosage and get the count for lives and dies, and AUC is .889515 which is a pretty good fit.
    How to know the best Dosage?

    Reply
  7. Hi Charles,
    Can you please tell me how to generate ROC curve to validate established diagnostic criteria where outcome is only binary (yes or no) ?
    Does that ROC hv multiple points in curve? If so why?
    Thanks. Dr. Tryambak

    Reply
  8. Thanks for the helpful Excel guide for ROCC analysis, Charles! I have to ask, what is the context in which we are evaluating doses that successfully yield death as a result??? 🙂 It’s my most sinister statistical experience to date.

    Reply
    • Ben,
      I have clearly chosen an unfortunate example, especially in this period, although the “death” of a virus can be viewed as equivalent to the “life” of a patient.
      Charles

      Reply
      • Fair enough. I’ll leave the discussion of whether or not a virus is living for a different forum.

        I’ve modified your sheet and will use as a template for evaluating diagnostics against a gold standard test. An additional consideration for my application is “how fast is each measurement approach”? Thanks again for sharing!

        Reply
  9. Hi Charles,

    I was wondering why you added row 7 to the data. Should you always add a row at the beginning with a TPR of 1 and an FPR of 1? I ask because I noticed other calculators seem to do this as well, but I can’t seem to find an explanation for why this is done.

    Thank you,
    Joe

    Reply
    • Hi Joe,
      This is completely optional. I did this to avoid having to make a special case for the cumulative distribution for the first data element.
      Charles

      Reply
  10. Dear Charles,

    Can I use your ROC data analysis tool to analyze my research for publication? how to cite it?

    Thanks…

    Reply
  11. Dear Charles
    please, can you give more details about the formula
    F9= 1-D9/D$17; what means “D$17”? Is it the sum from D9 to D17 or what?
    Thanks

    Reply
  12. Thank you for all of your great information, Charles!

    When I try to construct my ROC curve, I receive an error message that says only non-negative integers can be used; however, I do not have any negative integers in my data.

    Please advise!

    Thank you!

    Reply
    • Hello Velda,
      Good to see that you are getting value from the Real Statistics website.
      If you email me an Excel file with a spreadsheet containing your data, I will try to figure out why you are getting this error.
      Charles

      Reply
  13. My area under the ROC curve is .798 —– but my ‘Accuracy’ total shows .735 —- so, how does on reconcile these differing results?

    *thank you for offering these tools; what a generous thing to do.

    Reply
  14. Charles,

    Many thanks for the amazing site for the Excel user. I have a ROC plot in a spreadsheet, but I also need to have on the graph the 95% CI band. Which Excel formula should I use to compute the Low Limit and the High Limit of the 95% CI for each criterion?

    Best wishes,

    Paulo

    Reply
  15. HI,Charles
    When I want select input range, I get this message(Input Range mus only non-negative integer data enteries; first non-valid entery is at a cell A2) but all my TPR and FTR are something like this…
    TPR FTR
    0.84 0.02
    0.67 0.03

    Thanks!

    Reply
    • Hello Sol,
      This looks like the output from a classification analysis. If you want to create a ROC then the input takes the form of frequency values, whose values must be non-negative integers.
      Charles

      Reply
  16. I do not understand this example at all. The numbers make no real sense. What does it mean for 2mg that 34 live and 3 die or for 10 123 live and 23 die? Given the data best would be to use a full dosage of 20 because than all die. No computation at all required.

    Reply
    • Hello Klemens,
      Yes, all things being equal, you are correct that a dosage of 20 would be best, but often there are other issues that need to be factored in. E.g. suppose a dosage of 18mg or more costs 100 time more than one of 12 to 16.
      In any case, the objective of the example is to show how to construct the curve. Perhaps a different example would have been more meaningful, but this example illustrates the basic technique.
      Charles

      Reply
      • Hello, Charles,
        I wonder how the experiments were designed: do I think correctly that the 10 dosage levels (rows) represent 10 independent experiments with 37, 70, 99, 119, 146, 155, 84, 47, 34, and 15 mosquitos?
        And why the cutoff level was chosen as 10, not 12?

        Reply
        • Hello Jiri,
          Yes, you are correct that these represent independent experiments.
          You can set the cutoff level to any value that you want. You could choose 12 instead of 10 if you like.
          Charles

          Reply
  17. Hi Charles
    I have downloaded the excel plug in, but it is not working. Whenever i enable the real stats with alt T I, excel hangs and then asks real stats pasword

    Reply
    • Hello Prashant,
      See Password Prompt
      Note too that the key sequence Alt-TI may be different if you are using a non-English version of Excel. In this case, you may need to follow the the following instructions (as shown on the webpage from which you downloaded the Real Statistics file):
      Select File > Help|Options > Add-Ins and click on the Go button at the bottom of the window.
      Charles

      Reply
  18. Hi Charles,

    How you are calculating TPR as 1-E9/E$17? As per my understanding, it should be E9/E$17. Please help.

    Alok

    Reply
  19. Thank you for sharing your examples above, it is very useful indeed. Would be perfect if you could add to the above example how to calculate 95% confidence intervals for AUC.

    Reply
  20. Dear sir
    I have use 2 method (class 1 and class 2) to compute sensitivity, Specificity and accuracy for 7 data set (D1-D7) how can i compute its AUC and how it can be plotted for ROC? Please help me. Your help would be much appreciated. Please see the data table below.

    D1 D2 D3 D4 D5 D6 D7
    Class 1 Sen 95.85 95.56 97.26 96.35 94.56 95.69 96.87
    Spe 94.36 94.39 96.54 95.36 94.25 95.82 94.56
    Acc 96.33 97.12 96.55 96.89 95.66 95.89 96.78
    class 2 Sen 93.76 93.45 94.28 93.56 94.58 93.58 93.42
    Spe 92.91 92.13 93.25 93.28 93.29 94.57 92.99
    Acc 94.56 94.12 93.23 94.12 93.56 93.23 93.89

    Thank you so much.

    Reply

Leave a Comment