[Each day in October, I analyze one of the 31 item writing rules from Haladyna, Downing and Rodriquez (2002), the super-dominant list of item authoring guidelines.]

Writing the choices: Use humor if it is compatible with the teacher and the learning environment.

And now, the last Haladyna et al. rule. Or guideline. Whatever you want to call it. All their library research to compile the consensus of textbooks, researchers and other authors ends with this one. Is it a good rule?

Ha! This is a bad rule i) by their own official standards, ii) because they seem to contradict themselves and iii) because it their explanation shows how little they understand that challenges of item development.

First, their 2002 article shows that they do not have even a single source that says that humor is a problem in items. Not one! Only 15% of their sources even mention the idea, and none favor their rule. There is no consensus here, and among the small number of sources who address it, they kinda say, “Meh, whatever.”

Second, their fuller statement of their rule contradicts their shorter statement. That is, their Table 1 says, “Use humor if it is compatible with the teacher and the learning environment,” but their Table 2 says “Use humor sparingly.” Do these two statements mean the same thing? Should it be used sparingly, or is it ok? The longer version seems to be generally supportive, and the shorter version seems generally unsupportive. How is this a rule or guideline or advice? Do they mean that it is ok for classroom assessment but not for large scale standardized assessment? I can imagine that advice, but that is not what they offer. I can’t tell what they are offering.

Third, the worst thing about this rule is that it is teacher-centric rather than test taker- or student-centric. Test takers vary, and in an enormous number of ways. The problem with humor is that not everyone agrees on what is funny. People have different senses of humor, and attempts at humor in stressful or important situations is only a good idea if everyone gets the joke. But the less homogenous the group, the less likely everyone is to agree that something was actually funny. Haladyna et al. seem neither to understand this basic fact about humor, nor the basic fact about test taker variation—which is essentially the issue of Fairness.

To try to explain this rule, they offer an example. A bad example. It is supposed to be funny, but instead it’s just confusing. And it is double keyed. So, the problem is not the use of humor; the problem is that it is double keyed! In fact, I seriously question whether the example is actually humorous.

In Phoenix, Arizona, you cannot take a picture of a man with a wooden leg. Why not?

A. Because you have to use a camera to take a picture

B. A wooden leg does not take pictures

C. That’s Phoenix for you

There are so-oooo many problems with this item, including by their own rules. Not only is it double keyed, but option C does not even pretend to answer the question. The answer options are not parallel in grammatical structure, they vary enormously in length. It’s not at all clear what the targeted cognition even is. There’s a negative in the stem that is not highlighted in any way. Heck, this is an item that would actually benefit from adding “D. All of the Above.”

Is the problem humor? Is the problem that it attempts humor? Of course not! It is a bad item for so many other reasons that have nothing to do with humor.

Is this their best argument? Yeah, I suppose it is. Afterall, between even just these four versions of their list of rules (i.e., 1989, 2002, 2004 and 2013), this example is from the final one. They always cautioned against the use of humor, and this was their refined reasoning.

It is the least supported, least defensible and worst rule in their list. Maybe not the dumbest—if only because writing funny is just hard!—but still the worst. Sure, there’s a good reason to avoid trying to use humor, but they do not even wave in that direction.

[Haladyna et al.’s exercise started with a pair of 1989 articles, and continued in a 2004 book and a 2013 book. But the 2002 list is the easiest and cheapest to read (see the linked article, which is freely downloadable) and it is the only version that includes a well formatted one-page version of the rules. Therefore, it is the central version that I am taking apart, rule by rule, pointing out how horrendously bad this list is and how little it helps actual item development. If we are going to have good standardized tests, the items need to be better, and this list’s place as the dominant item writing advice only makes that far less likely to happen.

Haladyna Lists and Explanations

Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge.
Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.
Haladyna, T., Downing, S. and Rodriguez, M. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education. 15(3), 309-334
Haladyna, T.M. and Downing, S.M. (1989). Taxonomy of Multiple Choice Item-Writing Rules. Applied Measurement in Education, 2 (1), 37-50
Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 51-78.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied measurement in education, 15(3), 309-333.

]

Complex Variety: Assessment Development, Education and Occasional Other Topics

Latest & Greatest

Dr. Hoffman