In 1989 and in 2002, Haldadyna and his colleagues published compilations of item writing rules from the literature. These two lists, which I call The Haladyna Rules have been incredibly dominant ever since. Virtually every serious source of test development refers to them as the item writing guidance. In 1989, Haladyna & Downing called their list, “a complete and authoritative set of guidelines for writing multiple-choice items” (p. 37) and then they updated and refined it further with Rodriguez in 2002.
The problem is that the list is just not very good. It focuses on presentation and perhaps test takers’ confusion, but really gives rather little guidance about how to make sure that the item actually tests what it supposed to test. Now, we call this idea that a test (or item) is supposed to test what it is supposed to test, “validity.” There are actually much more complicated ideas and principles for validity, but at the fundamental level, the question is whether the test is doing what it is supposed to do. And the Haladyna rules that so dominate the “Item-Writing Guidelines/Rules/Suggestions/Advice” (1989, p. 40) simply steer clear of this idea, for the most part. (Among those more technical concerns and views about validity is that it is actually how the test is used that matter. But whether a test even can be used appropriately depends deeply on whether it is actually testing what it is supposed to test.)
But that is not the worst thing about them. I will save the worst thing about for the end of this series.
You see, there are 31 rules in the 2002 version. And there are 31 days in October. (Hello, lady!) So, each day of the month of October, I will analyze one of Haladyna, Downing and Rodriquez’s 31 rules. In order.
Yes, I am saying that rule 31 is the worst thing about the list.
Buckle up.