[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Keep choices homogeneous in content and grammatical structure.

Two-thirds of their 2002 sources support this rule, but the only empirical source they mention is a study by one of them that found this makes no difference. Perhaps more importantly, their only logic/or reasoning is that when the items are not all parallel, it can clue one of the items as the key. Their example in their 2002 book makes that obvious.

What reason best explains the phenomenon of levitation?

a. Principles of physics

b. Principles of biology

c. Principles of chemistry

d. Metaphysics

Putting aside magnetism and superconductors (i.e., physics), it’s not hard to see how they answer D would draw disproportionate attention. Depending on the stimulus, D might actually be the correct answer. But the problem is not that lack of homogeneity! The problem is that just one of them sticks out, not that they are not all the same.

So, clearly D should be “Principles of metaphysics,” to match the others. But then there’s a redundancy with physics…but there’s a conventional wisdom among item developers on how to deal with that—one that Haladyna et al. do not ever mention. As I wrote for Rule 22, answer options should all be parallel, all be distinct, or come in pairs (when an even number of answer options).

a. Principles of astronomy

b. Principles of astrology

c. Principles of physics

d. Principles of metaphysics

Do any of those uniquely jump out? They are not homogenous, as two of them are science and two of them are not. The same guidance works for grammar, length, voice, content, etc.. Answer options really do not need to be homogenous.

But here’s the real issue: There is a far far far more important rule for crafting distractors. Rule 29 is the most important rule, make all distractors plausible. If that requires violation of homogeneity, fine. Do it! That second set of answer options above is only good if each answer option is deeply plausible, and a shortcoming of homogeneity (e.g., as in creating pairs) is fine if it does not hurt plausibility. It is plausibility that matters, not homogeneity.

The real issue seems to be that so much of the Haladyna rules is about undermining guessing strategies in a world in which test takers simply can recognize the best answer or not. It does not consider the cognitive paths that test takers might take, and almost never considers that the best distractors are the ones that represent the results of mistakes in understanding and/or application that test taker may make along the way. Perhaps they just assume too simplistic content?

So, no, I don’t buy this rule.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman