[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Style concerns: Minimize the amount of reading in each item.

This rule is mentioned by one-third of their 2002 sources. But this rule also seems rather redundant with Rule 16, (Avoid window dressing (excessive verbiage)(. Obviously, the irony of repeating a rule amount being mindful of unnecessary verbiage is a joke, right?

So, is reading time a problem? Let’s put aside “excessive verbiage,” because that is Rule 16. This is about reading time.

Yes, reading load should be managed. Reading load matters. Whether or not students face formal time limits, students run out of stamina. Reading load matters. But many standardized test items are based upon reading passages that are included in the test. They have to be included because there is no centralized control over the text that students read in this country and because we often want the texts to be new to the test takers (i.e., so they must lean exclusively on their own reading skills to understand them). But these passages are always far shorter in length than we expect these test takers to be able to understand on their own. Certainly, this is true on ELA (English Language Arts) exams, but it is also true on science exams and social studies exams. When mathematics is actually applied in the real world, it is done in embedded problems and situations, not just as bare arithmetic or algebra. These excerpts and passage are already shorter than we expect students to be able to handle, in authentic circumstances.

Minimizing reading time surely does allow for more items and improve reliability, as their 2013 book says. But minimizing the reading time, as the 2002 article suggests, often simply comes at the expense of the targeted cognition. Sure, if the item is just a basic recall item, keep it short. But if it is a test of reading skills or problem-solving skills (i.e., which often call for recognizing extraneous information), minimizing reading time undermines content- and construct-validity. It looks to a rather quantifiable measure about items (or item sets) and declares that that is more important than the actual purpose or goal of the assessment or the item.

To be fair, their 2004 book does say, “As brief as possible without compromising the content and cognitive demand we require.” But their 2013 book says that minimizing reading time can improve both reliability and validity, which shows a rather poor understanding of validity, in my view. Yes, the 2013 book does acknowledge the occasional importance of extraneous information, but it says nothing about the importance of passage length. And let’s be clear: this rule is not aimed at just the stem, just the answer options or just those two. This is about the whole item—which of course includes the stimulus!

Now, if this rules said, Minimize the amount of reading in each item, without compromising the test content, it would be better. It would almost be good, provided of course that we could all rely on everyone taking that caveat seriously. But the 2002 list—the only version that includes all the rules in a handy one-page format—does not say that. And nowhere in that article does it anywhere offer that caveat. None of the other versions offer that critical caveat as part of the rule, though its inclusions would not make this the longest rule. There are at least half a dozen longer rules, including rules made up of multiple sentences.

So, this could be a good rule, but not as it is presented. As presented, it too often suggests undermining validity.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman