Key Concepts
Item analysis is a technique that evaluates the effectiveness of items in tests. Two principal measures used in item analysis are item difficulty and item discrimination.
Item Difficulty: The difficulty of an item (i.e. a question) in a test is the percentage of the sample taking the test that answers that question correctly. This metric takes a value between 0 and 1. High values indicate that the item is easy, while low values indicate that the item is difficult.
Item Discrimination is a measure of how well an item (i.e. a question) distinguishes between those with more skill (based on whatever is being measured by the test) from those with less skill.
Discrimination Index
The principal measure of item discrimination is the discrimination index. This is measured by selecting two groups: high-skill and low-skill based on the total test score. E.g. you can assign the high-skilled group to be those subjects whose score on the entire test is in the top half and the low-skilled group to those in the bottom half. Alternatively, you can designate the high-skilled group to consist of those subjects whose total score is in the top 33% and the low-skilled group to those in the bottom 33%. The discrimination index is then the percentage of subjects in the high-skilled group who answered the item correctly minus the percentage in the low-skilled group who answered the item correctly.
The discrimination index takes values between -1 and +1. Values close to +1 indicate that the item does a good job of discriminating between high performers and low performers. Values near zero indicate that the question does a poor job of discriminating between high performers and low performers. Finally, values near -1 indicate that the item tends to be answered correctly by those who perform the worst on the overall test and incorrectly by those who perform the best on the overall test.
Point-biserial Correlation
Another measure of item discrimination is the point-biserial correlation between the scores on the entire test and the scores on the single item (where 1 = correct answer and 0 = incorrect answer).
Example
Example 1: A 20-question test is given to 18 students. The table in Figure 1 shows the results for question 1 and for the whole test. Calculate the difficulty Df of question 1, its discrimination index D (using the top third vs. the bottom third), and its point-biserial correlation coefficient p.
Figure 1 – Item Analysis
We calculate the difficulty by Df = SUM(B4:B21)/COUNT(B4:B21) = 11/18 = .611. Since 5 of the top 6 students answered question Q1 correctly and 2 of the bottom 6 got the question right, the discrimination index for Q1 is D = 5/6 – 2/6 = 3/6 = .5. Here, 6 = 18/3. The point-biserial correlation coefficient p = CORREL(B1:B21,C4:C21) = .405.
Observation
In computing the discrimination index the boundary between the high-skilled, medium-skilled, and low-skilled groups is not always so clear. E.g. in Figure 1, the 6th and 7th highest total scores are both 16. So, which one of these should we choose to be in the high-skilled group? In this case, it doesn’t matter since the score for either subject on Q1 is 1, but if one of these had a score of 1 and the other had a score of 0, then we would have to make a decision. For our purposes, we will count the score for Q1 as the average of these, i.e. 0.5. More detail about this matter can be found in Real Statistics Item Analysis Functions.
Examples Workbook
Click here to download the Excel workbook with the examples described on this webpage.
References
Matlock-Hetzel, S. (1997) Basic concepts in item and test analysis. Texas A&M
http://ericae.net/ft/tamu/Espy.htm
Office of Educational Assessment (2016) Understanding item analysis. University of Washington
http://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-analysis/
Vallejo-Elias, J.(2016) Interpretation of discrimination data from multiple-choice test items
No longer available online
Albano, A. (2016) Introduction to educational and psychological measurement, Course Notes. University of Nebraska-Lincoln
https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1005&context=prtunl
I want to understand the difference and/or similarity between
Item Analysis and Item response theory
They are different approaches to the analysis of questions in an exam. Item analysis is quite straightforward, while item response theory is more complicated.
Charles
hi. i am using the resource tool pax. I can get the calculation for difficulty index correctly. But when I am comparing the discriminant index with the formula, I cant get the answer when the sample size is odd such as 23. I regard the higher group 12 students and another 11 lower group.pls help me.
Hello Aishah,
The current implementation of the discriminant index doesn’t permit a 12/11 split. The split depends on a percentage, such as 27%. There may be overlap between the two groups, which will be the case if you use 50% with 23 elements. See the link in the last paragraph of this webpage for more details.
If you need to use a 12/11 split, you can do this manually, as described on the website.
Charles
While finding item discrimination using the Edwards formula is there an upper limit for the value of t?
Nancy,
I am not familiar with Edwards formula. Can you give e a reference?
Charles
Hello Sir, I want to know that how can I find the discrimination index of 5 point Likert scale questionnaire items? Please suggest an appropriate method and how it will be measured i.e., on Excel or SPSS?
This webpage addresses this issue in general. For a Likert scale you can use the point biserial correlation as described on this webpage. You can also use the discrimination index, but since this index uses dichotomous ratings, you need to map ratings such as 4 and 5 to 1 and 1,2,3 to 0 (or something similar). E.g. see
https://stats.stackexchange.com/questions/26890/item-analysis-for-a-likert-type-questionnaire-item-discrimination-point-biser
Charles
hi, Dr. please must someone uses a particular software in doing that or can use excel directly
Magaji,
It depends on what you are referring to. Somethings can be done with Excel directly. Others can be done with the Real Statistics software, although you can also do them with Excel directly but it will take more work.
Charles
Dr buenas tardes, Dr, el item discriminante solo aplica para preguntas de tipo dicotómico? Como se harìa la aplicaciòn para preguntas con màs posiblidades?
Dr good afternoon, Dr, the discriminant item only applies to questions of the dichotomous type? How would the application be done for questions with more possibilities?
Gerardo,
If you are referring to multiple choice tests, for the discrimination index you assign 1 to the right answer and 0 to the wrong answer. For the point-biserial correlation coefficient, you can do the same. You can also calculate a point-biserial correlation coefficient for each incorrect answer.
Charles
Dr Muchas gracias
Hi Charles,
Can I please ask, what if the item is not scored from 0-1, but out of say 4 marks. For example, a question on an exam paper could be marked out of 4, and students could receive either, 0,1,2,3,or 4 marks?
See https://www.real-statistics.com/reliability/item-analysis/partial-score-item-analysis/
Charles
How would you do this same process in excel to calculate difficulty/item discrimination if you’re using a Likert scale? So instead of correct/incorrect, you have a range of 1-5 for your responses?
Sara,
I don’t know of a standard way of doing this, but here is an approach that might work for you:
Item Difficulty: Take the mean of all the scores
Item Discrimination: Instead of using the point serial correlation between the item and total score, use the correlation between the item and total score (or average score for the whole questionnaire).
Whether this approach is suitable depends on why you want to use such measurements for Likert data.
Charles
it good to remove the DF in the item
Sorry, but I don’t understand your question.
Charles
If the results of Difficulty is low, let say that it is only at most .250, can we removed the item or to what extent that the item can be removed because it is difficult. Or can we remove an item base on the results of the DF?
Whether or not you remove it is your decision. There could be good reasons for keeping a difficult question in the test. If the question is difficult and has poor discrimination, then I would remove it.
Charles