[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]
Content: Avoid opinion-based items.
Just about a quarter of their 2002 sources even mentioned this rule, and while a higher share of their 1989 sources mention it, over a quarter of those sources argue against it. The 1989 article sets a 70% threshold (of those who bother to mention the rule) to establish a consensus, and seems tailored to this rule. Of course, in their opinion that’s a perfectly fine way to claim a consensus, when 70% of those who bother to mention it don’t argue against it.
But this rule seems to go against the idea from other rules that we should assess critical reasoning. Heck, Haladyna has a book on assessing higher order thinking. In fact, so much of higher order thinking in the humanities and social science is about evaluating opinion. Expert judgment is a form of opinion, something that experts usually—but not always—agree on because of their common experiences and understanding. Surely, opinion should be grounded, and I taught my students to explain and defend their opinions. Opinions are important.
Unfortunately, the 2002 article does not explain what they mean. But in their books, they offer the two questions, “What is the best comedy film ever made?” and “According to the Film Institute of New York, what is the greatest American comedy film ever made?” They reject the former question as an “unqualified opinion” (i.e., not supported by “documented source, evidence or presentation cited in a curriculum”) which they think is bad. They accept the latter as a “qualified opinion,” which they think is good. (Of course, ignoring the fact that there is no Film Institute of New York.) If the problem is “unqualified” opinions their rule should say that, but it doesn’t.
I believe that this shows that this is a ridiculous rule. Some opinion-based items are fine, even for multiple choice items. But some are not. Obviously, simply asking the test taker to identify the item developer’s opinion is hugely problematic, but the problem there is that the item is asking test takers to read the item developer’s mind, not that it is asking about an opinion. Asking the test taker to identify the opinion of a character in or author of passage is fine—even highly appropriate. The problem has nothing to do with the fact that these items are about opinions; the problem is asking test takers to read item developer’s minds.
[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.
Haladyna Lists and Explanations
Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.
]