The Unacknowledged Tyranny of the Platinum Standard

I just got back from an educational research conference, and as is my wont, I had a lot of conversations about assessment and educational measurement.

On the morning of the last day of the conference, as people were saying their goodbyes, I found myself in conversation with a brilliant young psychometrician still on her first job in industry. I was pushing her to consider examining the application of some sort of multi-dimensional psychometric model when she got a chance to do her next little research project. She was concerned that that might mark her as being a little weird, as the industry is so heavily invested in unidimensional psychometric models. She pulled in Yu Bao, a professor in James Madison University’s Assessment & Measurement program, who was walking by. Yu agreed with me that there’s a lot of room there for a psychometrician to make their name there with multi-dimensional models.

I went on one of my typical rants about the mismatch between unidimensional psychometric models and multi-dimensional domain models and the platinum standard. That is, the way that psychometricians bring model fit statistics to data review meetings and strongly suggest that items with poor model fit—poorly fit to the misplaced unidimensional model—be removed. (They do this with item difficulty statistics, too, but that is not as bad for validity claims as this use of model fit statistics from inappropriate models.)

This young psychometrician pushed back, however. She said that psychometrics uses unidimensional models because they fit the data better.

But that’s not true. That’s not true in practice and that is not true at research conferences. Just the previous day, a colleague of mine told me about a session that he attended—and walked out of. There, a young psychometrician was explaining the use of factor modeling techniques to something something something—I didn’t attend that session, so I do not know what he was trying to do. He showed that item 31 did not fit his model, so they removed it. They did not remove it because it was not well aligned to the assessment target or larger macro-construct. He never even looked at the actual item itself. Rather, they just removed the item because it did not fit the psychometric model they were using.

No consideration for the construct’s theorized model. No consideration of the formal domain model. Only the psychometric model.

My colleague was walking by, so I pulled him into the conversation. Yu agreed that this happens sometimes, despite what the brilliant young psychometrician had been taught and expected to see.

My colleague and I know that this happens quite a bit. Psychometricians come with their opaque techniques and intimidatingly precise numbers. Few people outside of psychometrics have the confidence to push back against people armed with something they do not understand, and that precision—and all of its decimal places—is so easily mistaken for accuracy.

The platinum standard is powerful. It shapes our assessments, and not for the better. It leads to the removal of possibly the best aligned items, simpy because they do not accord to the demands of psychometrically more simple tools and their rather unrealistic assumptions. The platinum standard forces those inappropriate assumptions on the entire field, requiring those who actually focus on the domain model and content alignment to simply accept the demands of psychometrics.