[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Content: Keep the content of each item independent from content of other items on the test.

There is no explanation of this rule anywhere in either Haladyna and his colleagues’ 1989 article or 2002 article. Moreover, less than half of their sources for either article mention this rule. Their companion 1989 article does not list a single empirical study on this rule. And yet, they still call this a consensus.

One must turn to their 2004 or 2013 books to find an explanation. No, the explanation is not convincing. They offer a pair of items as a counter-example to demonstrate what is wrong. Note, this is THEIR example. They change the names, but I will use the more recent ones (2013).

Who was Kay’s best friend?

a. Wendy

b. Betty

c. *Tilda

Who was quarreling with Tilda?

a. Kay

b. Wendy

c. Betty

They claim this shows what is wrong because if the test taker knows that Tilda is the correct response to the first item then they will know that Kay cannot be the correct answer to the second item. Because…stories are never about good friends quarrelling? Not at all. Of course not. Where would the drama or story arc be in that?

Frankly, this rule makes it more difficult to ask anything but the most trivial questions about literary passages because the themes and characters run through them. No, those items are not independent in topic; after all, they are taken from the same story. This is almost as great a problem when using informational passages.

Now, there is an issue with the independence of items, one that I and one of my closest colleagues disagree on. As a science educator, she wants items that scaffold up to larger or deeper understanding. She thinks of item sets as a single larger items with various components, even when they technically are not. She wants later items to build upon the answers of earlier items—and even wants the structure of the item set to help test takers to do that. I really do appreciate what she is trying to do, and as a classroom teacher I might do the same thing. But we are both in large scale standardized assessment now. We are trying to find that optimal (or least bad) balance that allows test takers an opportunity at every point on an assessment, assesses test takers across the range of an NGSS standard (i.e., performance expectation), and yet does not provide so much scaffolding that we cannot be sure whether the test taker actually has the inferred proficiency.

How independent should items be of each other? Not nearly so much as Haladyna et al. claim. And their example is laughable.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman