ICC for Test/Retest Reliability

We can use the intraclass correlation coefficient (ICC) for test/retest reliability (see Split-Half Reliability). This is especially useful in the pilot phase of questionnaire design in measuring consistency.

Example 1: How many students are required to determine to test/retest reliability of the total score of a psychometric test instrument that measures the level of anxiety in patients with dementia where we seek to achieve an ICC(1,1) of .60 with power of at least 80%.

Since the number of raters (i.e. tests) is 2, we use the formula =ICC_SIZE(0,.6,2,.8) to obtain a minimum sample size of 15 (see ICC Continued). If you want to achieve 90% power, you will need a sample of size 20, but if you want to detect an ICC of .75 with 90% power a sample as small as 11 subjects is sufficient.

Example 2: Use an ICC(1,1) model to determine the test/retest reliability of the dementia psychometric test instrument based on the scores of the 15 patients on the left side of Figure 1 at Time 1 and Time 2 two weeks later.

Using the same approach as for Example 1 of ICC for Comparisons against a Gold Standard, we obtain an ICC of .746, as shown in cell M9 of Figure 1, with a 95% confidence interval of (.387, .907).

Test/retest reliability

Figure 1 – Test/retest reliability

Example 3: Use an ICC(1,1) model to determine the test/retest reliability of a 15 question questionnaire based on a Likert scale of 1 to 5, where the scores for a subject are given in column B of Figure 2 and the scores for the same subject two weeks later are given in column C.

The ICC of .747 is shown on the right side of Figure 2.

ICC test/retest reliability

Figure 2 – Test/retest reliability

4 thoughts on “ICC for Test/Retest Reliability”

  1. Actually, the issue arrises when there is negative correlation due to testakers with lower scores on the test generally getting high scores on the retest, while testtakers with higher scores on the test not improving their performance as much on the retest and/or getting lower scores on the retest, such that the mean of the retest may be higher but the correlation retest and test is negative.

    Reply
  2. Hi Charles! I am back with another question. Using ICC(1,1) for a test-retest scenario, I sometimes get a neagative value for certain segmetations of my data set. This seems follow the direction of the correlation between my test and retest scores. That is, if the mean retest score is lower than the mean test score, and there is a negative correlation, then there tends to be a negatie ICC value. Is this to be expected? How should this be interpreted? I am finding mixed info ont the Web. Thanks a lot!

    Reply
    • Hi Mike,
      The values of the ICC and correlation do seem to be similar. I did come up with a made-up example where the signs are different:
      (4,0), (5,2), (5,2), (2,5), (7,5).
      Charles

      Reply
      • Thanks a lot for this reply! Later I saw that the directions are not always the same. It seems that if low test scores tend to result higher retest scores while high test scores result in lower retest scores, you get a situation with very low, possibly negative ICC, and a negative correlation. But I have seen that sometimes the correlation is positive while the ICC is negative. With my data, I have not yet seen a case where ICC is positive and r is negative. That seems logical to me though. So I guess this is just one situation where you can get a negative ICC.

        Reply

Leave a Comment