The Ultimate MisNAEPery: Confirmation Bias

This week’s NAEP results have been deeply deeply disturbing. They should leave all of us with incredibly deep questions about education research and the education research and policy community. We have witnessed a new form of misNAEPery that should cast deep doubt on things that we have long taken for granted as true.

MisNAEPery is the misuse of NAEP data — results from the National Assessment of Educational Progress, known as “The Nation’s Report Card.” Please know that NAEP is a very different set of standardized tests. Students do not specifically prep for it and it has no stakes attached to it. Results are not published for individual students, teachers or schools. In fact, that is not even possible because it uses something called “matrix sampling.” This means that different students have different questions on their forms, and then all the data is combined into aggregate sores for entire states. ENTIRE STATES! (There is a also reporting on 27 particular districts which have volunteered to take part in the TADU (i.e., Trial Urban District Assessment), but because those districts are smaller than their states, they have larger margins of error.)

This approach allows NAEP to address countless objections to most standardized tests. Freedom from having to compile results for individual schools, teachers or districts allows check for and account for issues that other assessments cannot even dream of. Short tests that nonetheless address large content domains, care around item interaction effects and and and…it’s the gold standard.

The Question of the Decade

The educational policy and practice question of this young decade is about the impact of the pandemic on students, learning and teaching. The most obvious and contentious aspect of this question is the contribution of school building closures – and consequent reliance on remote (i.e., Zoom) schooling – on “learning loss” (i.e., the unfortunate name given to the idea that students did not learn as much during the pandemic as they would have otherwise, that they did not progress as much as they would have if there had not been a pandemic).

It is odd that this is such a contentious issue, because most everyone has a stake in believing that remote schooling is inferior to in-person schools. Those who wish to attack teachers unions, educational bureaucracy and even teachers blame them for school building closures and the resulting learning loss. (Of course, they conveniently ignore the fact that other schools that are more responsive to market pressures — such as private and charter schools — also closed the buildings during the pandemic.) Those who think that the New 3 R’s (i.e., rigor, relationships and relevance) are vital to success with the old 3 R’s (reading, ‘riting and ‘rithmatic), that teaching is more than just lecturing and is instead about meeting students where they are and meeting their needs…well, we think that time with teachers is valuable. We think that teachers matter. We do want to think schooling can help students beyond their own cognitive developmental path and the impact of various out-of-school factors.

We should all want to see that lost time in school with teachers had a cost. Even if people disagree about whether it was necessary or worth it to pay that cost, virtually everyone expects that the new NAEP results to give us a sense of what the cost was.

This is because states differed enormously in how long school buildings were closed. Chalkbeat’s coverage of the new NAEP data shows this, such as Texas schools being open 88.7% of the time and California’s schools being open just 6.9% of the time. (Go read that coverage. It’s surprisingly good. And note that while Matt Barnum wrote the story, the graphics — like the one I have copied below below – are by Thomas Wilburn. The originals are interactive.)

The Unexpected

The problem is that NAEP does not show that states whose schools were open to more in-person learning had markedly stronger results. It just doesn’t. For example, California’s results slipped back less in 8th grade reading and math than Texas’s and exactly the same amount in 4th grade reading and math. New York (14.2% in-person instruction) slipped less than Texas in 8th grade and more than Texas in 4th grade. Florida (96.8% in-person) was also worse than Texas in 8th grade and only better than Texas in 4th grade reading. Again, note that this is not about absolute level of performances, but rather is about how much the state’s students slid back during the pandemic, from one cohort to another. Even just among these four largest states, we do not see the expected results.

Taking all the states’ results into account, we do not see the expected patterns. In some cases, we see far weaker versions. In some cases, we do not see anything like what we expected. What literally everyone expected. (And I mean literally literally. 100%. Absolutely everyone. Not a single person predicted what this data shows. Not one.)

Again go read Chalkbeat’s coverage. And if you want more, there’s EdWeek’s coverage.

The Deeper Problem

No one assaults NAEP as being bad data. It is the gold standard. Those of us who decry low quality many state assessments and decry bad analysis of quantitative data point to NAEP’s quality. Those with more confidence in quantitive assessment results look to NAEP as the benchmark.

But suddenly, in light of these shocking results, people are making excuses. Because the 2022 NAEP results do not show what everyone expected, people are…behaving differently.

The real value in data and research is not in finding supports for what you already believe. The real value is helping you to figure out what is true. For those with intellectual integrity, it is more important to learn than it is to convince others. It is more important to be right tomorrow than appear right along.

NAEP is telling us that we were wrong. That all of us were. Now, from a bayesian perspective, the strength of our prior belief should make us less open to countervailing evidence. It should. That is OK. But the strength of NAEP as the highest quality evidence should make us question any prior belief. That is what NAEP is for. That is how everyone who knows about NAEP views it.

So, I have to ask, if NAEP is does not shed valuable light on this question, what is it ever useful for? If this is not the absolute best case use of NAEP data, then what is? And if NAEP is not useful, how is any achievement data ever useful, or any on-demand evaluation of student knowledge, skills and/abilities — be it standardized or not?

Or, if NAEP remains credible, what does that imply about the value and nature of teaching and the classroom? What does this say about natural cognitive development, as opposed to intentional learning? What does this say about the potential for additional use of remote schooling and how we might reshape childcare structures in this country?

Integrity in the Future

What is not acceptable is to simply ignore this year’s NAEP results.

I need to re-evaluate my confidence in NAEP more broadly. That’s my next step. I am comfortable saying that I would rather find problems with NAEP than have to devalue teachers and the new 3 R’s. I’ve not really dove into the mechanics and methodology in NAEP in a long time. And I’ve never subjected NAEP items to RTD’s level of item examination. At the same time, I also need to rethink the potential of…oh my god it hurts to type this…cyber charter schools. Oh the pain! The pain!. But I was wrong about something, either NAEP or nature of teaching and the importance of teachers.

As I look around this week, I do not see this kind of soul searching. I do not see acknowledgements of the importance of this moment for education researchers, educational policy practitioners, in school educators and assessment experts.

That worries me.