ROC and Classification Table Data Analysis Tool

Data Analysis Tool

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack supplies the ROC Curve and Classification Table data analysis tool which provides an easier way to construct the ROC curve and classification table. We show how this is done for Example 1 of Classification Table and ROC Curve.

The tool accepts either of two input formats: one with two columns (e.g. B24:C34 of Figure 1 of Classification Table) and another with three columns, which we illustrate in Figure 1.

ROC summary data

Figure 1 – Data input for ROC (three-column format)

The format is similar to that in Figure 1 of ROC Curve except that only the upper bounds of the intervals are shown in column A. These correspond to the bins in the histograms, except that we now have two frequency columns (B and C) instead of just one.

Analysis

To perform the analysis, press Ctrl-m and double-click on the ROC Curve and Classification Table data analysis tool. Fill in the dialog box that appears as shown in Figure 2.

ROC/Classification dialog box

Figure 2 – ROC Curve and Classification Table dialog box

Note that we choose a cutoff at the 5th row of the data by specifying the upper limit of the failure range, namely 10. Also, note that in Example 1 of ROC Curve we estimated the area under the ROC curve (AUC) via rectangles. This time we estimate AUC by using trapezoids instead.

After clicking on the OK button in the dialog box, the output shown in Figure 3 is produced.

ROC and Classification Table

Figure 3 – Output from the data analysis tool

In addition, the output also includes the ROC curve shown in Figure 1 of ROC Curve.

The classification table is identical to that shown in 1 of Classification Table and the ROC Table except that the AUC values are the same as that shown in Figure 1 of ROC Curve. The AUC values are slightly different since the area under the ROC curve is estimated via trapezoids instead of rectangles.

Observations

If we use the data from Figure 1 of Classification Table as input then we would insert B24:C34 (two-column format) from Figure 1 of Classification Table into the Input Range field in the dialog box in Figure 2 and set the Cutoff to 5 (5th row). The output would be the same as that described above.

The output shown in Figure 3 also includes the 95% confidence interval for the AUC (range M12:M15). Note that the 95% corresponds to an alpha value of .05 in cell M12 since 95% = 1 – .05. You can change the alpha value in M12 of the output and the corresponding confidence interval will change automatically (e.g. inserting .01 in M12 will generate the 99% confidence interval in M14 and M15).

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Hanley, J. A., McNeil, B. J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve
https://pubmed.ncbi.nlm.nih.gov/7063747/

Hintze, J. L. (2008) ROC Curves. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/ROC_Curves-Old_Version.pdf

Hintze, J. L. (2022) One ROC curve and cutoff analysis. NCSS
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/One_ROC_Curve_and_Cutoff_Analysis.pdf

IBM (2011) ROC algorithms IBM SPSS Statistics 20 Algorithms
http://www.sussex.ac.uk/its/pdfs/SPSS_Algorithms_20.pdf

25 thoughts on “ROC and Classification Table Data Analysis Tool”

  1. Hi Charles,
    I have 6 rows, but when plotting ROC and calculating AUC, there is an extra row with value 1,1. This increases the AUC. Do I need to include this value or not?

    Thanks,
    Jagan

    Reply
  2. i am sorry…. but may i
    i have an elementary question!
    you say that “cutoff” is upper limit of failure range… right?
    i can’t fully understand that meaning.

    Reply
    • Yes, the cutoff is the upper limit of the failure range. The cutoff divides the failure range from the success range.
      It is an arbitrary value. You can choose any value that you think makes sense.
      Charles

      Reply
      • thank you so much~~
        there’s another question.
        Fail-Obs Suc-Obs
        Fail-Pred 413 58
        Suc-Pred 114 221

        above 2*2 matrix is made by figure 1 and 2.
        in terms of medical statistics, i think that (fail-pred*fail-obs) is sensitivity, and (suc-pred*suc-obs) is specificity. am i right?

        Reply
  3. Hi Charles,

    Many thanks for the great statistical tool you have created and generously share.
    I am investigating the relationship between injury and various musculoskeletal test outcomes. Set scores (cut-offs) have been proposed by the developers of the tests i.e. scores for minimum standard achieved or not achieved. Based on these scores classification tables have already been constructed. You mention that the Real Stats data analysis tool can create ROC curves and AUC values from classification tables also. Please could you advise how this can be done as oppose to using the format as described in the above example?

    Reply
    • Hello Candice,
      You can’t create the ROC from the classification table. The reference on the webpage is to the webpage titled Classification Table.
      Charles

      Reply
  4. Hello Charles,
    Hope you are well amidst Corono outbreak.
    I need to validate an established diagnostic criteria (say a fixed Bilrubin and Serum Albumin cut off) on a set of say 30 cases.
    Result will be binary; with either diagnostic criteria met or not met in 30 cases (includes patients and not patients).
    Can ROC curve be generated from the above results? If so, how?
    Thanks.Dr. Tryambak (INDIA)

    Reply
    • Hello Dr. Tryambak,
      The approach is exactly as in Figure 1. Column B contains the counts for binary value 0 and column C contains the corresponding counts for binary value 1.
      Charles

      Reply
      • Charles,
        Thanks for your prompt reply.
        However I don’t have a column A.
        I have a fixed criteria (say Bilrubin 5mg and Albumin 3) . No different set of values.
        I am bit confused. Please help me.
        Thanks.
        Tryambak

        Reply
  5. When I highlight data to be used for ROC curve, I get a message that says only non-negative integers can be used, although none of my data contains negative numbers.

    Reply
  6. Please refer to the quote, and let me know whether we should use ‘trapezoids’ instead of rectangles to calculate ROC using the provided tool?

    “The AUC values are slightly different since the area under the ROC curve is estimated via trapezoids instead of rectangles.”

    Reply
  7. And How do you determine cutoff point if you don’t have one? The data I am using, technically we’re looking to point out a cutoff point but I don’t know where to look besides follow the sum of sensitivity+specificity (highest).

    Reply
    • Hi Anne,
      This should ideally be determined based on your domain knowledge, i.e. not on statistical grounds but on what makes sense for your field of study.
      You could, however, use the cutoff that maximizes the Accuracy (or some other statistic). In Excel you can determine this value by using Solver.
      Charles

      Reply
  8. There’s no comment section for AUC Confidence Interval page, so I am leaving a question here. How to calculate p-value for the AUC and ROC?

    Reply

Leave a Comment