Where are The Standards in The Next Generation Science Standards?

The Next Generation Science Standards is one of those gargantuan things that is just amazingly impressive. Awesome even. Science is so many things. It is an approach and it is what that approach has taught us. There’s this single idea (i.e., science), and the different disciplines within science. 

Trying to organize all that mess into a single anything is just incredible. I often come across what ECD (Evidence Centered Design) calls a domain model and am flabbergasted that anyone was able to do it.

Wow.

The thing is, NGSS is so commonly misunderstood. It was an effort to organize a domain – or set of domains – both to establish standards and to create supports for educators to help student reach those standards. Clearly, it is influenced by ECD – and it even uses some ECD terminology. But unlike ECD, it is at least as focused on supporting curriculum development and instruction as it is on supporting assessment. 

The most common misunderstanding is in which part of NGSS constitutes the actual standards. There are the PEs (Performance Expectations), the CCCs (Cross Cutting Concepts), the DCIs (Disciplinary Core Ideas) and the SEPs (Science and Engineering Practices). Now, my favorites are the SEPs, but that does not make them the standards. 

What Does NGSS Say?

Luckily, the official website of NGSS, <nextgenscience.org> has a page for Understanding the Standards. It explains exactly what is going on. 

The second paragraph explains that the three dimensions (i.e., CCCs, DCIs and the SEPs) predate NGSS, and that they were “introduced in the National Research Council's A Framework for K-12 Science Education.” Clearly, the standards part of NGSS is not found in those three dimensions that NGSS inherited from The Framework. Rather, the NGSS standards would have to be found in the parts that the NGSS coalition wrote later — albeit inspired by NRC’s Framework.

A little further down the page is a video which explains further. At around the one minute mark, the narrator says, “The standards have been developed as student performance expectations, which are statements of what students should know and be able to do by the end of instruction.” In case the text on the page was not clear, the video is rather definitive. The standards are the Performance Expectations. 

So, What’s the Problem, Then?

The PEs are built from the three dimensions, but the PEs have far greater specificity than anything in any of the three dimensions. For example, HS-PS1-1. (“Use the periodic table as a model to predict the relative properties of elements based on the patterns of electrons in the outermost energy level of atoms.”) does not say anything about protons, atomic number or atomic mass. In fact, none of the PEs mention atomic number or atomic mass in the context of the periodic table of the elements. (Only two PEs even mention the periodic table, at all.)

Does this mean that the periodic table is not important? Does this mean that those parts of the periodic table are not important?

No. That is not what that means. 

Instead, if forces us to confront the relationship between learning goals (or outcomes) and learning pathways. It certainly makes us think harder about the differences between instruction and assessment. 

No one would ever suggest that atomic number and atomic mass are unimportant concepts. No one would ever suggest that their place in the periodic table is unimportant either. (Frankly, the periodic table of the elements in another one of those awesome works of genius.) If you care about those aspects of the periodic table, no one is arguing with you. You can teach that, and that stuff is important to understand on the way to being able to meet the performance expectation. 

But understanding – or demonstrating understanding – of how the periodic table of the elements is built upon atomic number and can be used to look up atomic mass is not “what students should know and be able to do by the end of instruction.” Precursor? Yes. Is it allowed? Of course! 

Is it required? No. No, it is not. 

Oh, It Hurts!

Yes. It hurts. There are things in science that I love (e.g., Do you understand how Dmitri Mendeleev arranged the periodic table by atomic number AND electron levels? It’s amazing!) which are not a part of the Performance Expectations. How can that not be in the standards!? I love that stuff!

There are things all across science (particularly things found in the DCIs, but also things in my favorite dimension (the SEPs) that did not make it into the PEs. Important things. 

Let me be abundantly clear: Some of my favorite ideas and fact from science are not in the Next Generation Science Standards Performance Expectations. 

But let me be equally clear: I do not have the right to say what the standards are. That is up to the standards writing bodies and the state legislatures that endorse them (or edit and then endorse them). The fact that I think that something is important does not mean that I get to make it a part of the standards. The fact that I can make a compelling case for important it is and for how many others think that it is important does not give me that authority, and it does not overwhelm the stronger cases that it is not in the standards.

Yeah, that fact hurts, too. Being both technically and morally right does not give me authority over our democratic institutions to decide what is taught in all of our schools and/or should be on our official assessments. That fact hurts me every day.

No, No, You’re Wrong Because I’ve Read…

You’ve read the NGSS Structure: How to Read the Next Generation Science Standards document, linked to on that same page? You've found where it has said stuff like, “[The DCIs are] the most essential ideas in the major science disciplines that all students should understand during 13 years of school.” Yeah. It does say that. But depending on that kind of sentiment misses the nature and intention of NGSS – intention that is made explicitly clear in that same document.

NGSS is clear about what it means by standards. Its authors were quite aware that science is a broad umbrella and not every student will – or even could – learn all of it. NGSS posits that the standards should be the part of science knowledge (and approaches) that all students should learn. They wrote NGSS’s PEs as standards, “to ensure that this set of PEs is achievable at some reasonable level of proficiency by the vast majority of students.” The PEs should be the standard part – the baseline – for all students. 

NGSS is also clear that these PEs (as standards) are a floor, not a ceiling. “A second essential point is that the NGSS performance expectations should not limit the curriculum.” Schools are free to teach more than the standards. Even individual teacher can – and do! – bring in their own favorite ideas, applications and activities. Curriculum writers are free to go beyond the NGSS standards. In many cases, they should. 

In fact, the PEs are likely not sufficient to fill an entire curriculum. In fact, there likely should be much more taught than the PEs. But NGSS says that the PEs are the most important parts, and they have to be taught. The rest? Well, different students in different classrooms, schools, districts and states can be taught from the rest in various combinations of ways. 

But the NGSS performance expectations are the standard part. They are the standards.

But NGSS Says That Some States Do Include More

Yes, the authors of NGSS are quite aware that they do not control the states. They are quite aware that states can adapt and modify others’ work before adopting them as the official state standards. “Other states also include the content of the three foundation boxes and connections to be included in ‘the standard [sic].’” The primacy of our democratic institutions to make such decisions is simply a fact, and NGSS acknowledges that fact. 

But if you are going to depend on that idea to suggest that NGSS does not get to say what the standards are then you simply have to accept that each state gets to decide. That still takes you and me out of the equation. If a state has said that the PEs are the standards, then the PEs are the standards, and it doesn’t matter what you or I prefer, or what NGSS’s authors intended. And if a state says that it is more than the PEs, then it doesn’t matter what you or I prefer, or what NGSS’s authors intended. (Yeah, in those cases, this blog post and our disagree is simply moot.)

What Does This Mean for Assessment?

There is a deep philosophical issue at play here.

It is easy to answer that question when we are talking about classroom assessment. Classroom assessment should assess whatever is taught in that classroom. No question. Classroom assessment should be aligned with instruction – the full breath of instruction. (Or maybe what the school and/or district has decided should be the full breadth of instruction.)

But our big formal standardized assessment? That poses a different set of issues. Should our assessment aim to measure everything that could be taught, and thereby exist as this standard against which to measure ourselves against our greatest curricular and/or learning aspirations? When I took geometry at this weird program, the final exam was out of 200 points, but we only needed like 60 points to pass. The exam covered everything, but we just needed to demonstrate knowledge and skills in enough areas to show we deserved to move on. Not everything. Just enough. 

That is not how assessment generally works in America. That is why that experience still stands out in my memory, so many decades later. In our assessments, we want every student to get every point. Our aspirations for assessment is that the test takers top out. We award our highest marks only for students who approach 100%. And there is too much science to expect that every student has a chance of doing that. We do not even require students to take all the science classes. Sure, some high school students may take biology, chemistry, physics, AP chemistry, AP physics and astronomy. But even those kids didn’t take AP biology or earth science. Staying at the high school level, how many states or school districts require even four years of science?

How could it be fair to give assessment on content that students have not had the opportunity to learn? How could be fair to give assessments on content that teachers were not on notice that their “students should…be able to do by the end of instruction”? How is that fair to students, to educators, to districts and/or anyone else who might face consequences for student performance on these assessments?

I can imagine a world that has NGSS-aligned assessments that address all of the Performance Expectations and go on to sample from other content. But I do not know how decisions about which of that other content should be sampled should be made, which raises those challenging opportunity to learn and notice to teachquestions. Even putting those concerns aside, though, we would have to make sure we are doing a truly excellent job on assessing the PEs before we even begin to think about assessing anything else. 

 

 

Standards: Instruction vs. Assessment

I am a sucker for alignment. As much as I try to keep the human element in mind, and as much as I love creative and divergent lessons, I am deeply attracted to rational alignment. Good policy that actually supports good practice, and practice that aligns with policy? Man, I love that! I want the right hand to know what the left hand is doing, not to be working at cross purposes and even both hands working to support each other.

Educational standards are an attempt to create alignment. We want all students to be working towards the same learning goals, regardless of what district they live in or what teacher they were assigned to. We want the combat the soft bigotry of low expectations. And we want to bring the best thinking about what is possible and what is advisable to inform what our schools do for all of our students. We write and adopt standards to guide instruction.

We also look to standards to guide assessment. We want our assessments to be aligned to instruction and we accomplish that by aligning assessment to the same standards as instruction.

 
29DC919F-7A38-466E-84B6-036BEAF140CC_1_105_c.jpeg
 

The thing is, as much as I like rational alignment, when standards inform instruction they should be understood quite differently than than they inform assessment. Instruction should is guided by standards, but not hamstrung. There are many other factors that inform instruction. On the other hand, assessment really should be much more constrained by standards.

In the next few posts, I will explore those differences. I will explain how the best instruction is tied to and grounded in the standards, but also builds beyond them. And I will explore how and why standardized assessments must focus more narrowly on more limited conceptions of the standards.

How Important is Reliability?

2016’s The Standards for Educational and Psychological Testing say that that validity is “the most fundamental consideration in developing tests and evaluating tests.” This is the second sentence of the first chapter of that book. The 1999 edition said the same thing, without repeating the word “tests.” The 1985 edition agreed, but back then it was the first sentence.

Validity is the alpha and the omega. it is everything.

So, where does that leave reliablity?

The Cliché

Last week, I ran through the cliché explanation of reliability of validity with the metaphor of a target. My rude punchline was that psychometrics — being concerned with metrics (i.e., numbers) — has nothing to offer us about validity.

 
Figure A. Reliability and Validity

Figure A. Reliability and Validity

 
Figure B. Psychometric View of Reliability and Validity

Figure B. Psychometric View of Reliability and Validity

 

Because psychometrics has no way to think about validity, it doesn’t have a target at all. Rather, it just looks at how tightly clustered the hits are.

(I know about internal structure and convergent/discriminant evidence. Those are still about reliably. The latter is about reliability with other measures, but it begs the question of whether the other measures are valid. Yes, correlation with various outcomes might offer something, but that a topic for its own post.)

Generally, psychometrics has no theory, idea or vision of validity, so it raises reliability to be the more important consideration. But reliability is not the alpha and omega. It is a false god.

The Psychometric Defense

The smartest explanation for the importance of reliability that I have ever heard is that it is the upper bound or upper limit on validity. That is, in language of the cliché, you cannot hit the bullseye consistently if you cannot be consistent. You cannot hit anything consistently if you cannot be consistent.

My basic response to that is that I do not care how consistent you are if you are not near the target.

So, here’s the real question: which is better?

 
Figure C. The Worst?

Figure C. The Worst?

 
Figure D. The Worst?

Figure D. The Worst?

 

I acknowledge that they are both pretty damn lousy. But those who prize reliability would prefer Figure D because it is — at least — reliable. I look at Figure D and am quite sure that does not measure anything that I care about. It’s not noisy; it is just wrong.

Figure C is noisy. There are real problems. It is a lousy and unreliably measure. But at least there is some signal of what I am looking for in that noise. Sure, the confidence intervals are huge, but there is information of value in there.

Putting it in very concrete terms: I do not need another test of socioeconomic status. No one needs another test of socioeconomic status.

The Problem of Prioritizing Reliability as the Upper Bound on Validity

From what I have seen and read, the idea that reliability is the upper bound on validity has morphed into the idea that we increase validity by increasing reliability. And therefore, we can stop worrying about increasing validity because we can just focus on increasing reliability.

There are people who confuse reliability and validity. There are people who say “reliability” when they clearly ought to mean validity, but the difference is simply not important enough to them to realize that they have made a mistake.

When the means becomes the ends, what had been valuable actually comes the obstacle.

Concerns with Reliability as Obstacle to Validity

There are many causes for the quality problems with our big standardized tests. In my view, the greatest problem is that we are stuck in a vicious cycle in which perceptions by educators and the public of low quality (i.e., lack of validity) limit willingness to spend money for testing and to devote student time to testing. This harms the quality of our tests and…well…repeat.

But this is not the only problem.

Figure E.

Figure E.

Figure F.

Figure F.

Figure G.

Figure G.

The problem is that those who look to reliability as the most important consideration see moving from Figure F to Figure G as an improvement, and see moving from Figure F to Figure E as a decline. Many are always unwilling to give up reliability in order gain in validity.

There are item types that constraint reliability, ether because they take so much resources that tests must rely on fewer item or because they cannot be scores as reliably. And those item types are incredibly disfavored. Items types that simply cannot get to real cognition behind standards are not disfavored. Instead, we get highly reliable items that too often fall short of the actual targeted cognition.

How Does This Keep Happening?

Psyshometricians — with their emphasis on reliability — are high status. They have graduate degrees in measurement — perhaps even PhDs. Content development professionals (CDPs) are merely former teachers, with a ll the low status that that carries.

This status difference often prevents from even being at the table, and when they are at the table they are often overruled. When they are not overruled, they are often intimidated into relenting.

And so, psychometric concerns drive assessment development far far far more than questions about whether items actually measure what they are purported to measure.

Which clearly, in my view, violates the Standards.

When Do Public Servants’ Ideology Pose a Threat of Tyranny to America?

My whole adult life – my whole professional life – I have read and heard complaints about teachers as being too liberal. The values that teacher expose pose a tyrannical threat to America because liberals pose a tyrannical threat to America. Teaching well established science is unacceptable. Teaching well established history is unacceptable. Teaching literature from subdominant perspectives is unacceptable. Multi-culturalism? Culturally relevant and/or responsive pedagogy? Unacceptable.

American colleges and universities? Too liberal. A threat to freedom of speech. A threat to liberty. 

 Public education has been under assault for…well, it has been under assault my whole lifetime. Certainly my whole professional life. 

Yesterday and today – January 6 and January 7, 2021 – I simply could not stop contrasting those complaints that I have heard and read for too many decades with what I saw on my television and saw online.

I do not mean the seditious insurrectionists who would overthrow democracy. 

I mean the uniformed and armed elements of our government who would overthrow democracy, who stand by for tyranny and are all too willing to crush liberty.

Understanding Tyranny and Liberty

These terms are so often misused that I feel it necessary to explain what I mean by then.

Our Founding Fathers were incredibly focused on tyranny. They signed onto the Declaration of Independence, which describes the tyranny with which they were concerned. Read it

  • They wrote of violence done against them (“He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people.”). 

  • They wrote of denying elected representation in government (“For imposing Taxes on us without our Consent,” and “For suspending our own Legislatures.”). 

  • They wrote of acting and interim appointments. Really. They did. (“He has made Judges dependent on his Will alone, for the tenure of their offices.”)

  • They wrote of refusing to be a leader of all the people (“Declaring us out of his Protection and waging War against us.”).

  • And yeah, I’ll mention it again, “He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people,” ‘cause the writing is  just that good.

It is pretty damn easy to see what tyranny is, if you actually read our founding documents. It is pretty easy to see what our government was set up to guard against. 

Similarly, it is not that difficult to understand what liberty means. It never meant freedom to do anything you want. It meant keeping religion out of government and government out religion. It meant freedom to assemble. It meant trial by a jury off one’s peers. Read the Bill of Rights https://www.archives.gov/founding-docs/bill-of-rights-transcript and not just your favorite one of those amendments. Keep reading beyond them to find out other forms of liberty that were most important to the giants of American history. 

They did not think that rules to protect public health were a threat to liberty. 

January 6, 2021

I would like the events of yesterday to be inexplicable. I would like to say that I was surprised. But they are not and I was not. I was infuriated, but not the least bit surprised. 

I certainly was not surprised by the civilians. And, frankly, I was not that surprised by the police.

The police let the seditious insurrectionists in. The police posed for selfies with them. And when some of them were ready to leave, the police gently escorted them out, even holding their hand to help them down the steps of the capitol.

I know it is cliché to mention it now, but contrast that with actions taken last summer against largely peaceful Black Lives Matter protesters around the country. With protesters against Brett Kavanaugh’s nomination to the Supreme Court. With disabled protesters against repealing the ACA. That’s the same city. The same steps. The same building.

Look around the country. The way the police treated protesters in Ferguson. The way they treated murderous armed militiamen in Kenosha. You know that I could go on and on.

The conspiracy to attack the Capitol to stop the constitutional processes that undergird the peaceful transition of power in the country was organized openly on the Internet. Members of Congress asked about security days ago. The Capitol Police then let the seditious violent insurrectionist into the Capitol. 

Were they overwhelmed? How could they possibly have been overwhelmed!? We saw just last summer that our governments’ armed forces know how to put down protesters, how to clear out protestors, how to protect themselves while aggressively suppressing thousands of people. 

The only truly surprisingly thing yesterday was the boldness of the claim that the governments’ armed forces and police were simply overwhelmed and could not stop the violent seditious insurrectionists. Even NPR used the word “unprecedented” this morning. 

But it was not unprecedented. We saw this in April in in Michigan.

There really are only two possibilities. Either the $400 million/year Capitol Police leadership’s incompetence rises to the level of the Bush Administration’s that lead to 9/11 – and thank god the violent seditious insurrectionists did not intend to kill all the members of congress who did not support the delusional fantasies of their megalomaniacal authoritarian strongman wannabe – or leadership of the Capitol Police did want to stop them. And the vast majority off of the law enforcement officers who support this this President – who never won a majority of the American electorate or had an approval rating above 50% – did not really want to stop them, either.

Law and Order

I choose to the more consistent option. I saw the armed and uniformed branches of our governments and their actions. I saw how they treat protesters on left and domestic terrorists on the right. I saw the violent armed seditious insurrectionists, including those who lionize the literally traitorous sedition to perpetuate slavery in America and lionize the avowed perpetrators of genocide. I saw that the armed and uniformed branches of our government stood by for hours as the violent seditious insurrectionists ransacked the capitol. 

I have read that each of the tiny number of people actually arrested yesterday was armed. How many more arms were there?  There were pipe bombs. Even as our members of congress were still in the building, these violent seditious traitors were allowed to do what they willed, treated with kid gloves and allowed to leave freely. 

The importance of respecting “law and order” has been a rhetorical cudgel used by the right since before I was born. It has been used against those protesting for equal rights, for access to the very things that our nation’s Founding Fathers demanded. And yesterday, there were these incredibly weak innovations of “law and order.” This time, they were not accompanied by the violence of our government, even though this time the supposed violators of law and order were, quite literally, violent seditious insurrectionists seeking to stop the peaceful handover of power, as dictated by our laws and our Constitution. 

It is quite clear what the greatest threat to liberty within our government and among our public servants is. It clear who would support tyranny.

It ain’t the teachers, the school or the universities.

 

 

References 

“Pro-Trump rioters escorted down steps of US Capitol by police,” even holding hands. https://www.youtube.com/watch?v=G9hjfQ3xgIA

Teargassing largely peaceful protesters https://www.npr.org/2020/06/01/867532070/trumps-unannounced-church-visit-angers-church-officials

More than 300 protesters arrested as Kavanaugh demonstrations pack Capitol Hill https://www.cnn.com/2018/10/04/politics/kavanaugh-protests-us-capitol/index.html 

Disability advocates arrested during health care protest at McConnell’s office. https://www.washingtonpost.com/local/public-safety/disability-advocates-arrested-during-health-care-protest-at-mcconnells-office/2017/06/22/f5dd9992-576f-11e7-ba90-f5875b7d1876_story.html

National Guard Troops on steps of  Lincoln Memorial https://www.snopes.com/fact-check/national-guard-trump-mob/

Selfies with Capitol Police. https://twitter.com/bubbaprog/status/1346920198461419520?s=20

Google search for: liberal bias in public schools https://www.google.com/search?client=safari&rls=en&q=liberal+bias+in+public+schools&ie=UTF-8&oe=UTF-8

Google search for: covid restrictions tyrany https://www.google.com/search?q=covid+restrictions+tyranny&client=safari&rls=en&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiUztWZiYruAhW2GFkFHRJECMsQ_AUoAnoECAcQBA&biw=1324&bih=792

TheFire.org

Chronically attacks on public education. https://thenewpress.com/books/wolf-schoolhouse-door

Again chronically attacks on public education. https://www.publicaffairsbooks.com/titles/derek-w-black/schoolhouse-burning/9781541788442/

Capitol Police opening the gates for the violent seditious insurrectionists. https://www.snopes.com/fact-check/capitol-police-opened-gates/

Political partisanship in the Secret Service so extreme that many agents cannot be trusted to protect the next president. https://www.bostonglobe.com/2020/12/31/nation/secret-service-is-making-some-staff-changes-presidential-detail-that-will-guard-president-elect-joe-biden/

84% of Police Offices Supported Donald Trump in 2016. https://www.policemag.com/342098/the-2016-police-presidential-poll

Violent Michigan Insurrectionists Overrun the State Capitol.https://www.theguardian.com/us-news/2020/apr/30/michigan-protests-coronavirus-lockdown-armed-capitol

2020 Video of Violent Insurrectionists Overrunning a US Capitol. https://www.youtube.com/watch?v=6_jWONaP-4U

Capitol Rioters Planned for Weeks in Plain Sight. The Police Weren’t Ready. https://www.propublica.org/article/capitol-rioters-planned-for-weeks-in-plain-sight-the-police-werent-ready

Capitol Police told member of Congress that they were ready for January 6. https://twitter.com/kyledcheney/status/1347314075710185475

Pipe bombs found in DC. https://www.cnbc.com/2021/01/06/fbi-says-it-is-investigating-suspicious-devices-in-washington.html

Reliability and Validity: Revisiting the Cliché

There’s a cliché metaphor that is is commonly used to explain the natures and relationship between reliabilty and validity. I think there there is more to be learned through this metaphor than is presented.

The Cliché

The metaphor used is a target. I have seen archery, darts and javelins. But I am not a good enough artist to show those. However, I did make a target and set of images to develop the metaphor further. (Notice the depth and texture I made? Notice how the light comes from the upper left? I did that. Intentionally. I’m so proud of myself!)

 
An Empty Target.

An Empty Target.

 

The cliché explanation points out that reliability is the consistency with which one hits the target and validity is how on target one is. So, in Figure 1, we see low reliability and low validity. That is low reliability because the hits are not consistent. Were they clustered together, they would be consistent, and consistency is really the technical statistical (and psychometric) meaning of reliability. Figure 2 also shows low validity, because the hits are not really on target, in that they are not near the bullseye. But in Figure 2, the hits are clustered, so they are reliable. That consistency is reliability.

Figure 1: Low reliability and low validity.

Figure 1: Low reliability and low validity.

Figure 2. High reliability, but low validity

Figure 2. High reliability, but low validity

Figure 3: High reliability and high validity.

Figure 3: High reliability and high validity.

Figure 3 shows the dream — high reliability and high validity. Tightly clustered and clustered exactly where we would want them to be clustered. As the Kool-Aid Man would say, “Oh, yeah!”

The Optional Addition

Sometimes, the explanation includes a Figure 4.

 
Mediium Reliiablity and Validity v2.jpg

Figure 4. Medium reliability and medium validity?

 

Figure 4 shows that there is a middle ground between Figure 1 and Figure 3. That one can have middling reliability and validity. The problem with this, when it is included it is presented as tradeoff with Figure 2. That is, the gains in validity (i.e., closer to the bullseye) are offset by the losses in reliability (i.e., less tightly clustered). But I don’t buy that. I think that the gains in validity are clearly and obviously well worth the losses in reliability. Methinks that the difficulty of suggesting that Figure 4 does not present a vastly superior outcome to Figure 2 is why is is often excluded. (This comes from an agenda that I will explore next week.)

Figure 4b. Not quite as good.

Figure 4b. Not quite as good.

 
Figure 4c. High reliability and medium validity.

Figure 4c. High reliability and medium validity.

Certainly, Figure 4 is better than Figure 4b. The latter figures shows hits that simply are not as close to the bullseye, even though they are exactly as scattered as the former figure. I understand the claim that perhaps the hits in Figure 4c show equal reliability/validity tradeoffs with Figure 4b. But both seem clearly inferior to Figure 4, to me. (Again, an agenda to explore next week.)

Extending the Metaphor

I think that we can extend the metaphor for two more lessons.

First, I think that the differences between Figure 4 (it’s the same Figure 4 shown above) and Figure 5 are the most important differences in this whole metaphor. In reality, we simply cannot expect perfect reliability. Not even Figure 2 was perfect. It reality, it is just as question of how much better we can make the reliability. In reality, small differences are what we can achieve, where we can improve, almost all of the time.

Figure 4.

Figure 4.

 
Figure 5

Figure 5

Incremental progress is progress, after all. And, in reality, almost all progress that has any chance of sticking is incremental progress. So, if you can go from kinda medium to a little bit bette than kinda medium? Take the win!

Last, there is one final lesson we can take in this metaphor. Figure 8 shows the statistical view of reliability and validity. it shows the psychometric view.

 
Figure 6. The Psychometric View

Figure 6. The Psychometric View

 

No, there is no target in the psychometric view. Because we cannot quantify validity, statistics have almost nothing to say about validity. They do not even see a target. But damn aren’t those hits tightly clustered!

Democracy and Schools

I occasionally come back to this them of the importance of politics and democracy for our schools. And I am back here, again.

I was taught by a grad school professor that politics is how we come to community (or national) decisions that are based on values, rather than on technical criteria. That different political systems give us different ways to make those decisions for communities. Perhaps I should have understand that that was the essential nature and purpose of politics before that, but that is when I did.

In this country, we use democracy as our political system. Sometimes, we even think of “democracy” and meaning whatever the American politician system is. (And I am ignoring the historically ignorant people who claim that we are republic and not a democracy, because they clearly have not read what the founders meant and wrote about each of those terms.)

So, what is the purpose of our democracy? That, why is democracy our system for political decision-making? I can think of four potential reasons.

  • Democratic accountability for our leaders. (i..e., the ability to kick out the bad ones)

  • Learning the will of the people. (i.e., doing what the people want)

  • Obtaining the consent of the governed by including them in decision-making. (i.e., decreasing resistance to governance)

  • Obtaining legitimacy for government. (i.e., simply a moral requirement)

My niece is taking a class at college this year on The Future of Our Democracy. I am not hopeful about the future of our democracy. Whatever the purpose of our democracy, I think that ti is being undermined.

These purposes require access to the ballot box, and that is being limited as it has not in decades. These purposes require that political leaders are honest about their positions, their opponents positions and the contents of the bills they support or oppose, but flat out lies about all of that seems to be at a high. These purposes require a willingness to recognize when you are in the minority and concessions to the majority that they get to win (and to rule) and we have clearly lost much of that.

I’ve been worried for a long time about this constitutional crisis that we are in that is undermining our government’s ability to govern. But the state — the future — of our democracy seems quite uncertain to me.

Which takes me to our schools. Too few people vote in school board elections. It is rarely clear what those votes really mean. Local school boards get too little attention just all all the time, but disproportionate attention for things that should not matter so much. Changing high school mascots gets more attention than budget cuts or new curricular directions.

I know that we need democratic oversight for our schools. And I often assume that we mostly have it.

But I have to wonder, do we? And what could possibly improve the situation?

Hoist on My Own Petard: Drafting without Sufficient Pre-Writing

I recently was doing a small implementation evaluation report for a long term client. I read the documentation, spoke to the people, reviewed the documentation and thought about it. I went over my notes, make a list of the problematic implications. Normal stuff.

I thought I was ready to write. It would be fairly quick. No need to be the most formal, it being a long term client and all. Explain some background, lay out the facts, point to the issues, make some recommendations. I figured that it would be about 5 pages, single spaced with a bunch of headings and bullets. It turned out to be six pages.

Here’s the thing: I didn’t really pre-write properly. Sure, I had the ideas, but there is the critical step between the research and actually writing a decent draft, which I more or less skipped.

Way back in the day, I was taught that this step is called “pre-writing,” a term I kinda hate. There is so much work that comes before this step, and calling this step “pre-writing” just ignores all of it. But that is what I was taught, and it is stuck in my head.

I tend to call it “outlining,” because I believe that that is the most useful way to do it. In this case, maybe 15 lines, taking Intro and Summary for granted. That leaves three top level headings, Background, Procedure and Issues. I had that in my head — which is where some outlining can happen. But I was being to lazy (or rushing too much) and did not figuring out what the big Background Issues were before I started drafting. The Procedures section was fine. But the Issues section was a mess. And the Recommendations missed a really useful one.

The Issues section did not have a great order. The bulleted paragraphs were not distinct enough. They were not close to equal weight. Some issues ended up split across too many bullets. Ugh.

This was my fault. I knew better. I see this mistake often. I coach people to do better, all the time! So, I had to take my own painful advice and follow it.

If you do not do you outlining or prewriting before you write a draft, it will be a bad draft. And it will be more work to try to fix it than it will be to start over again. If you use that first draft to work out exactly what you want to say, what your argument is and what points you want to make, it is not actually your first draft. That is your pre-writing. It it is valuable and important, but it is not actually a draft you should edit.

Finally realizing this, I opened up a new window and started typing anew. I could not just copy paragraphs and move them around. I needed to do a better job breaking up report. I needed to use the the early sections better to set up the later sections, without giving too much away early. The writing in later sections needed to be more self-contained. So, I needed to start a new document from scratch. I had to avoid the temptation of trying to reuse the bad stuff I had already written.

Hoist on my own petard. Making the most common mistake I have to coach my dissertation coaching clients though.

But I ended up with a report that I actually am proud of.

What Is an Ed.D.?

While the issue of the meaning of an Ed.D. is in the news right now, this is actually something I have had to explain many times.

In short, and Ed.D. — like a Ph.D. — is awarded by a graduate school of educator for a set of coursework and a relatively large formal research project written up in the most formal way. Everything else about it the Ed.D., everything that you have heard or read, varies by institution.

Are Ed.D.s merely practitioners’ degrees?

Nope. For example, until recently, the Harvard Graduate School of Education only awarded Ed.D.s, and did not award Ph.D.s. Harvard is very focused on research, sending very few doctorates back to schools and districts as administers or teachers — though it does send many many masters students back to schools, distills and other non-research institutions. Similarly, until recently, Teachers College, Columbia University — the nation’s largest graduate school of education — only awarded Ed.D.s and did not award Ph.D.s.

When I was at Teachers College, I observed this up close. If students there wanted to earn a Ph.D., they needed to officially get their degree from GSAS (i.e., the Graduate School of Arts and Sciences) of Columbia University, a different institution within Columbia University. Columbia has all the schools (i.e., law school, medical school, business school,GSAS, etc..). Teachers College students would need to find a professor in a GSAS department (e.g., political science, psychology, economics) to serve on their committee. They would also have to fulfill various graduation requirements of that department, in addition to the Teachers College requirements.

Teachers College offers economics courses, focused on the context of education. It offers political science courses, focused on the context of education. It offers psychology courses, focused on the context of education. It is a very large institution that offers a wide variety of disciplinary courses, with the courses always focused on the context of education. After all, it is Columbia University’s graduate school of education.

There was rarely a need to try to find a GSAS professor and take GSAS coursework — which was not focused on our context. Regardless of career goals, the Ed.D. was a perfectly acceptable degree.

Do Ed.D.s require less work than Ph.D.s?

At Teachers College, the Ph.D. required just 75 credits, plus an appropriate doctoral dissertation, butt the Ed.D. required 90 credits, plus an appropriate doctoral dissertation. This 75 could include the requirements of the GSAS department, or perhaps the student might need a few more credits.

So, the Ph.D. does not require more coursework.

The language of old policies of the university requires Ph.D. students to master a foreign language. However, sufficient research methods courses work counted as a foreign language.

Are Ed.D.s faster than Ph.D.s?

Many institutions that offer MBA (i.e., matters of business of administration) degrees also offer “executive MBA” programs, designed to better fit the schedules of working professionals. They focus on intensive weekend courses, and the occasional week-long full-time experience. This take the place of more traditional courses taken during the day.

Some of the programs appear — to my eyes — to demand less of this busy professionals. Less reading. Less writing. Less thinking time. But some of them appear just as demanding, and perhaps even more so.

Does this make them faster or weaker? It depends on the program, not on the degree. They all get MBA degrees.

Similarly, there are some practitioner-focused “executive” Ed.D. programs. But that is not intrinsic to the Ed.D. and their rigor varies by program.

There are also a wide variety of Ed.D. programs designed to be very fast — as fast as three or four years. Others counsel that a degree take a minimum of four years, but that students should expect more like five to seven years, if not longer. I do not think particularly well of the very fast programs, but I understand why they exist and why they are popular. But again, this is a difference between programs that has little to do with the Ed.D. degree itself.

What about J.D.s, M.D.s and Other.D.s?

J.D.s (i.e., law degrees) and M.D.s (medical degrees) are not research-based degrees. That is, they do not have a requirement of doctoral dissertation. Recipients of the professional degrees do not have to do a big research project in order to graduate.

Many aspirating medical researcher do MD/PhD programs. That’s where the research portion of the graduate work comes in, both in terms of training and in terms of experience.

Other graduate schools also offer doctoral degrees. Among research degrees are theology degrees, design degrees, library science degrees and many many more. The one that threw me for the biggest loop, when I heard about it, was nursing. D.N.Sc. Doctor of Nursing Science. There are also many many other professional degrees. Dentistry, social work, athletic training and many (mostly somehow connected to medicine) more.

Who gets to be called “Dr.”?

We know that that title should not be limited to those who have delivered babies, because it seems nonsensical to deny brain surgeons and heart surgeons the title, simply because of of the part of the body they work on. That really seems to miss the point. (Also, plenty of EMTs have delivered babies, and they are not called “doctor.”)

The question is whether the title should be preserved for medical doctors. I know that I have said many times, “Not that kind of doctor,” – almost always to be funny. In many contexts, the assumption is the the doctor in question is a medical doctor. However, the M.D. is simply a less demanding degree than a research doctorate. Perhaps the coursework is harder — depending on the program – but the M.D. lacks research and dissertation writing components of the array of research doctorates.

While there are reasonable questions to be asked about recipients of honorary doctorates using the title “Dr.”, it strikes me as simply asinine to suggest that those who have earned research doctorates should not be called “Doctor.”

Sure, I can see why some institutions might be questioned, but he degree itself? No, there is no reason to view Ed.D.s on their face as less demanding or rigorous than Ph.D.s., and certainly no reason to call them out specifically.

If you really want to be a condescending ass, question the quality of the individual dissection by actually reading it. Otherwise, is seems….well, just stupid….to challenge the well-established norm of calling holders of research based doctorates “Doctor.”

Justifying Arrogance with Humility

Humility is a core of Rigorous Test Development (RTD). This likely appears odd to a lot of people, considering how arrogant we are about RTD, the confidence with which I (in particular) often speak, and the even greater and more frequent confidence that people here in my voice. My collaborator often speaks with confidence as well — though as a woman, she faces greater potential penalty for doing so.

And yet, we believe very strongly in the value off humility in the work.

In fact, we believe that our humility in the work is not at odds with our our confidence — or even my arrogance. Rather, our humility helps to support our confidence.

First, we preach that professionals should be very mindful of the limits of their expertise. They should know when they speaking outside of their expertise. Even when they could justify some claims too expertise to people who might not know better, if they themselves know that the topic is outside their expertise, they should speak much more cautiously. For example, we know a lot about assessment, but we know quite a bit less about large scale performance assessment. We know a lot about scoring, but we know far less about large scale administration of tests.

Second, we think that it is important to respect the expertise of others. Certainly, if we want them to respect ours, we have to walk that walk ourselves. Respect in collaborative work simply cannot be a one-way street. Furthermore, one should affirmatively look for the expertise that others bring to the table, rather than assuming that they lack any at all. This does not mean deferring to everything they claim or in every area in which they seem confident. This is more critical than that. Evaluate their claims — implied or explicit — of expertise, and think about their bases.

Third — and perhaps this should be first — admit when you do not know something. Some people think that admitting ignorance is a sign of stupidity and incompetence. We believe that the truly confidence can — and should — admit this kind of limit. I know that I am capable, and I am confident enough that I do not need to pretend that I know more than I do. My ego is not so fragile that I cannot admit my current limits. And by admitting that limit (both too myself and to you), I give myself an opportunity to rectify it. Perhaps you can teach me, perhaps we can learn together, or may I can investigate on my own and report back to you.

Fourth, if it is important you to be right — as it is to me and my most cherished loves ones, to an annoying degree — affirmatively look out for when you are wrong. I do not mean that you should easily admit fault when you do not believe it. Rather, I mean try to limit the that fault to one occurrence. As a value, I would rather be wrong and corrected once, than be wrong many many times. Of course, it is (too?) hard to convince me that I am wrong, but I take such pleasure in learning that it is actually a little moment of joy when I realize it. The learning is great, and knowing that I to be right in the future matters a lot to me.

Fifth, try to know the limits of your own perspective and experiences. Obviously, this is key to most of the issues mentioned above. Broader experience enables each of us to better understand our earlier experiences. For example, our work with a second team (e.g., in a different organization) and make visible things that we took for granted in the first context. Different experiences — perhaps with different constraints and priorities — can give us a better understanding of the tradeoff and value of different kinds of decisions.

Sixth, there is a lot to be gained by seeking out and listening to the experiences of others. We can learn vicariously from their experience. In our own work, we combine inside-outsider perspectives, asking each other endless questions. We each have grown our understanding by learning from what the other has been through.

In these ways and others, we try to temper our confidence or arrogance. No, it does not make us less confidence, but instead of makes our confidence less fragile and — in our view — more justified. Our careful efforts at humility are not insincere, nor do they undermine us. Rather, the put us in a position to do better work.

Lessons on Representativeness from the 2020 Election

If there is one issue that I think most needs to be understood about conducting— or consuming — research, it is understanding the representativeness of your data. The 2020 presidential election illustrates this in a number of ways.

First, the pre-election polls were off. This was also true in 2016, when the polls were off by about 3 points. It appears that this year’s polls were off by a bit more than that, but not off by huge amount. However, this error was not evenly distributed about the country. For example, the polls in Minnesota were not off by very much, but the polls in Wisconsin were off by quite a bit. (note: I am not citing specific numbers because the the election results have yet to be certified in any state.)

The major explanations going around for this concern the composition of sample. Some people cite differences between the sample and the overall population in educational attainment, while others point to trickier construct, social trust. This latter groups that weighting samples by education simply does not do enough to correct for the underrepresentation of low-trust individuals in the sample. That is, educational attainment, they suggest, is a poor proxy for social trust.

Regardless of the details, everyone agrees that the group poll respondents were different that the voting public, along a number of factors. And while high quality pollsters try to correct for this, they didn’t capture something, this time.

It is easier to correct for educational attainment, because it is easy to ask about. Educational attainment is a very easily observed variable. However, social trust is much harder to see, harder to ask about, and therefore harder to pin down. It is an unobserved variable. We can correct or control for differences in observed variables, but not unobserved variables.

One reason why large sample sizes are useful is that we hope that with a large enough group, the unobserved stuff kinda cancels out. However, in this case, that low-trust group might simple be far less likely to respond to pollsters. This undermines their sampling strategy, both in data collection and in weighting the subgroups. Obviously, they need to do a better job of turning the unobserved variable into an observed variable. Somehow.

The lesson: Unobserved variables can entirely undermine the generalizability of your results when your sample is not representative of the relevant population.

Second, it appears that the samples in the pre-election polls were different than they had been in the past. Pollster reported a greater response rate than in recent years, and the composition of those respondents might have been quite different. For example, if were white collar and office workers were newly able to work from home, and thereby have more time and/or freedom to answer than calls and take part in polls, that might favor one party over another. If lowered education service workers were not impacted by COVID and various efforts to reduce its spread, old sampling methodologies could yield significantly different samples than in the past — even the recent past.

I have not seen any mention or discussion of pollsters observing this and treating as a problem to be solved, as opposed to an opportunity to be taken advantage of. Yes, it is great when the response rate goes up, but its might not be all good.

The lesson: Adopting old methodologies and assumptions in new contexts is always tricky. Always be diligent and careful to correct for seemingly positive changes, as they may carry with them less obvious harmful changes.

Third, early returns were rather misleading in many states. This was because the voters who were counted earlier were different — as a group — than those counted later. In some states, early and mail-in votes were counted earlier, and in some states those were counted later. Everyone expected that those early and mail-in voters were more Democratic than the traditional in-person voters, though no one could be certain how different.

In this case, there were two different populations, and no good way to anticipate their differences. Successful efforts by both major parties to increase turnout made comparing the groups even harder, as no one could predict how successful each of those efforts might be.

Only fools (e.g., me, my wife, my best friend) were paying any attention to the vote counts released early in the evening, on election day. Heck, it was still pretty foolish to pay attention later in the evening. We already knew how most states would turn out, and these returns simply could not predict the final result in the close states. Whichever group was counted first simply could not be generalized to the second group. At the very least, we all had to wait for a sufficient portion of each group to be counted to predict anything.

The lesson: We cannot expect to be able to generalize results from one population to some other population that is already to known to be different in significant and relevant ways. This is especially true when the extent of those different are not known.

Fourth, the final results of many states were predictable long before the actual results got there. For example, experts know that Michigan, Pennsylvania and Georgia would end up going for Joe Biden, even when Donald Trump was far ahead in the count.

How was this possible?

  • History provided clear insight into the proportion of votes that from some areas of those states, relative to others.

  • Counties and states reported estimates of how many votes they had outstanding.

  • History provided a baseline for expectations about the split of the vote in different areas.

  • Early counting of the in-person votes and the early/mail-in votes provided further information that helped refine those historically-based expectations.

Thus, it did not require magic to predict the future. Instead, one could simply use some basic algebra to predict the count (of what had already happened). Similarly, in short order it was not that difficult to see that counts in other states — though close — would not lead to a different result.

Now, none of this was clear early in the evening. But as more results — even partial — came in from across each state and from every group of voters, the eventual results were easy to see. Those with access too the most detailed historical and emerging data could see the trends and where they would end up.

The lesson: Given sufficient data, diligence and patience, one can extrapolate from sampled data to the larger population, so long as the samples are well matched to the larger populations.

None of this is new, but gaps between predictions, partial results and full results of the 2020 presidential election have received enormous amount of discussion and attention. This can be good opportunity to think more carefully about how representativeness and efforts to get representativeness play out, in practice.

Self-Selection, Silence and Representativeness

Unless they are doing program evaluation, researchers want to generalize from their data to some wider population or time frame. Education assessments similarly want to generalize beyond the handful of questions that appear on the test. But it is not just researchers and assessment developers who want to do this. Everyone wants to look at those results and infer that they can generalize to the larger population.

A larger sample size can make such generalizations more defensible, but large sample size is not enough. Heck, sample size isn’t even the point of sample size.

The point is representativeness. If you had a perfectly representative along the characteristics of interest, the sample size simply would not matter. Not at all.

Now, we know that we cannot get those perfectly representative samples because the whole point of our research/assessment is to uncover things about the population that we do not already know. So, sample size becomes important because we use random sampling approximate a representative sample — a trick that we have statistics to better describe and understand, but that only works with larger sample sizes.

Self-selected samples, however, undermine this whole effort. We already know that those who volunteer to participate in the study or survey or poll are different from those who do not. We just do not know how different or in how many ways they are different. People who respond to an emailed invitation to participate in research are different from those who do not. People who answer phone calls from pollsters are different from this who do not.

The massive New York City public school system asked parents to declare whether or not they wanted they children too return to in-school classes, hybrid class, or entirely remote learning this fall. It found that most students’ families would send their children back to school buildings.

Actually, that is not true. That is what NYC claimed and what was often reported. But it was not what families said and it was not what happened in the schools. In fact, roughly 1/4 of families said that they would not send their children back to school buildings and the other 3/4 of families said nothing. They did not opt for returning to school building. They did not opt for anything. They simply did not answer the survey.

The mayor and other assumed that their silence meant something they they understood. They assumed that the non-respondents were making as intentional decision as the the respondents were. They assumed that the non-respondents were like the respondents.

This was a self-selected respondent pool. There were no efforts at representativeness. There were no efforts to understand the non-respondents. It was the laziest possible way to collect data.

Work to ensure a representative sample is hard. It is expensive. It is work. And it is not fun. This work does not help you to get the answer you want. It is not about the substantive ideas, thinking or theories of whatever you are interested in. It is annoying and repetitive work to follow up with invitees, slowing down data collection, making it more expensive and giving reasons to doubt your results before you have even compiled them.

But response/non-response bias is huge. High integrity researchers take it quite seriously.

Why Democracy is Necessary for Policy-Making

Politics is how groups with differing or competing ideas, values and/or agendas reconcile those differences and direct group action. This can happen in small groups at work or on the national state. Perhaps in the worst case, the political answer is massive repression of the will of some minority groups, or even of a majority. More ideally, though political processes we come to some compromise, or at least an agreed upon mechanism to choose between differing preferences.

While there are technical questions in policy and eduction policy, there are an awful lot of values questions. For example, allocation of scare resources is almost always a values question. The appropriate balance between desires for excellence (i.e., fostering even greater accomplishments of those we expect to be the most accomplished) and desires for equity is clearly a values question, and one that has been central to my own education and always present in my mind as I work.

This year, we are seeing some of the top selective-admissions public schools in the country rethinking their admissions processes. The nation’s oldest public school (in Boston, MA), and my own high school (in Fairfax County, VA) have announced they are are shifting away from their old exam-based models. These announcements come after decades of complaints that those old processes yielded student bodies that were wildly non-representative of the large student populations in those school systems.

Those who have pressed for change have known that they are arguing from values, yet somehow those who have defended the old status quo have claimed that they were not. They claims some technical reasons why these schools had to admit only the top students, and that they had some appropriate means to identify those students. But those tests were build on values about what should be included and how to balance reliability and validity in assessment. And insisting that that admission should be limited to those who are best at whatever folks think the test measure is also a value.

Senator Mike Lee recently tweeted, “Democracy isn’t the objective; liberty, peace, and prospefity [sic] are. We want the human condition to flourish. Rank democracy can thwart that.” The problem with his thinking is that even if we accept that those are the proper objectives, how we understand or define liberty and how we distribute prosperity are questions of values. And we need some political mechanism to come to answers.

In this country, as in most of the world, we accept that we should have democratic systems to address the issues in the political arena. That is, we should use democratic mechanisms to translate a diverse array of views into direction for our governments and governmental policy — including public schooling. We have democratic oversight of our schools, either through elected school boards or through school board overseen by elected officials. Our various departments of education are given direction and overseen by our elected officials.

These exam schools get undue attention, for various reasons. I pay them too much attention because I attended one of them. At times, their admissions criteria can be one of the most widely covered issue in public education, and we appear to be in one of those times.

So, I feel the need to say again, nearly everything about admissions criteria and processes at these special schools is based on values. Everyone offering a thought or opinion about what their criteria should recognize that they are speaking from their values, and consider deeply why their values should be taken more seriously than those of others — particularly if they are counseling against the popular will of the communities which they schools are to serve.

Your Data Does Not Prove Anything

On this one hand, this post is about humility, but on the other hand this post simply about intellectual integrity. In this case — as with so many others — they go hand in hand.

We all want our studies and our evidence to high quality. We want our data and our results to prove our idea, or at least to disprove someone else’s. We want the satisfaction of a completed argument.

Unfortunately, we do not get that. I mean…never? Yeah, I am willing to say never.

Now, I am no nihilist. It is not that I think that evidence is meaningless or valueless. But rather, it is the accumulation of evidence that lends weight and should have us confidently accepting that something is true, or something is false. No single study can do that.

Thought it is not my main point, let me address those who claim experimental design accomplishes this. It does not, because it not sufficient. Not only do you need random assignment (i.e., the only requirement of experiential design), but you need it to be blind (i.e., participants do not know which group they are in). Not not just blind, but double blind (i.e., those administering the treatment also do not know which group is which). Not just double blind, but sufficient sample size for the unobserved potentially relevant characteristics to cancel out (i.e., a REALLY big study). Not just a big sample, but actually a representative sample from the population. Does your experiment have that? I’ll bet that it does not, if for no other reason than those administering the treatment know what they are doing. Sure, placebo pills can be made that look like treatment pills. But educators and therapists know what they are doing. People implementing policy — as opposed to passing out medications — know what is happening. So, experimental design is not sufficient to prove something true of false.

However, my main point is not about experimental design — though it should inform your understanding of its limits, in most cases.

My main point is that even if your data is accurately observed, recorded and understood, you still have not proven causality. The big issue that is so hard to rule out is alternative explanations.

The purpose of experimental design is to rule out alternative explanations. And if you have random assignment, its double blind, and you have a large sample that is representative of the population, ok. Sure. But violate any of those, and you cannot be sure that the effect you observe is due to the cause your are investigating. You just cannot.

This is often true in the world of assessment, as well. Evidence Centered Design (ECD) focuses on collecting evidence that is consistent with the claiming that one might want to make. What would it look like of the test takers could do X. However, even ECD — of which I am a big fan — has no procedure or call to ensure that the evidence could not be supplied for another reason. Even ECD has a blind spot for the ambiguity of evidence.

So, there are two lessons to keep in mind, moving forward. First, do what you can in your research or assessment design to minimize ambiguity of evidence. Second, remember that you cannot eliminate it, and therefore should be humble when making claims about what it shows — without ever using the word “proves.”

More on Literature Reviews

There are three more point that I want to make about literature reviews.

Mind the Gap

Most research attempts to build on an extend the research base in the literature. As such, they projects — including virtually all doctoral research projects — must identify a gap in the literature.

What constitutes to a gap? Well, it is where we have not gone before, have not looked before. It could be something entirely new, but usually is not. Instead, it tends to be a new version, a new context, a new kind of data or examination.

It is perfectly fine to look at existing research and wonderful if that approach would work in a different context. For example, would this work in a more urban school? Would that return the same results if it was in a lower property school? Can we get better results with a different kind of staffing? All of those are novel enough that they constitute a gap.

Or rather, they would constitute a gap so long as that project has not been done before by someone else.

One purpose of a literature review — not just the write up, but the actual work of reviewing the literature — is to find out what was has already been done. Redoing someone else’s almost identical project is not verboten (see below), but it should be be disclosed. And you should know that that is what you are doing.

You see, one thing that makes academic and scholarly research different than action research is that the goal is to help inform the field and take part in the scholarly conversation, rather than simply to inform yourself. If you are just looking for your own answer, you can do action research. Well, you can look to the literature and see yourself a lot of trouble, or you can try the action research yourself. But if you are doing scholarly research, you need to know what has gone on in the conversation and whether someone else has already said what you are aiming to say.

So, the gap is your niche. What are you looking into that no one else has quite looked into like you before. What is your angle? Are you using a different critical lens your analysis? Are you collecting a different kind of data? Again, these are enough to constitute a gap.

Replication Projects

Not all research has to be based on a gap in the literature. In fact, too much research is based on identifying a gap. We should have far more replication in virtually every field.

Just because I reported a particular results from my study, that does not mean that the case is closed. Yes, you could try to take a different angle on it (see above), but you could also simply try my exact project again. Hopefully, my results are robust enough to take it, but how can we be sure?

The best way to be sure of research results is to try again, and not just by changing things up a little bit. Yes, over time we would want to subject my findings to all kinds of checks by modifying this and that about it. But we also want to make sure that what I found is robust enough for others to find it.

We have a replication crisis in virtually every field of research. This is particularly well documented in psychology, but I believe that it is true for all fields. Certainly, education research — where we have many many studies that are not widely read and insufficient literature reviews to consider what has come before, jin part because of the volume of research out there – we have too little idea about what is robust and what is not.

Unfortunately, most doctoral programs require more original research projects than replication for qualifying dissertations. However, if your doctoral program might allow something very much like a replication, I think that it can be a very good idea.

Understand that replication projects do still require work for the literature review. You cannot simply replicated the original project’s lit review. Rather, you should do an even more complete review of the prior literature and review what has come since and been based on the original study. You should take an even more critical eye on the basis of the original study. And, of course, you should subject the original study to the strictest of scrutiny. That is no less work than a more typical literature review.

EdD quick doc programs

I always worry about Ed.D. programs that are designed to speed their students through the process. Often, they have them collecting data before they’ve done their literature reviews. This is simply not scholarly research.

Your research should be based upon some intersection between the scholarly conversation and your own interests. The scholarly conversation should inform the kinds of questions you ask and how think about the possibilities of addressing them. If you collect data before you have gotten familiar with what has come before, them you are only doing research based on your interests. That is action research, not scholarly research.

Now, if the field decides that action research is sufficient for an Ed.D., I would not argue against that. I could appreciate a line between Ed.D.s and Ph.D.s that was based on the distinction between action research and scholarly research. But that is not what the field says. Instead, there is some vague idea in some institutes that one is better than the other, while other institutions only offer one of them. For example, my doctoral advisor has an Ed.D. from Harvard’s Graduate School of Education, in part because that is the only doctoral degree they offered while she was there.

If you are considering and Ed.D. program that is designed to get you through quickly (i.e., in just 2-3 years), I strongly suggest diving into the literature on your topic as soon as possible, and certainly sooner than they would have you do so. Having a sense of the literature will help you to refine your topic and your research questions far better and make your project more satisfying along the way.

Reusing Rubrics Well

While rubrics are quite valuable when they are custom tailored to individual assignments, projects and items, they can be equally — though differently — valuable when they are reused over time, across assignment and even across grades (i.e., multiple simultaneous use). That clarifying value of rubrics (i.e., making criteria and performance levels more transparent) can guide both teachers and self-aware learners to focus their efforts on the most important lessons or KSAs (knowledge, skills, abilities), over time. Growth in performance and learning can have greater and clearer meaning when the criteria are clear and consistent.

However, there are potential dangers in multiple use and reuse of rubrics. The more they are used, the greater those dangers, and the more important it becomes to anticipate them and prevent them.

  • When rubrics are reused over time or simultaneously across grades, they may be used for different assignments and that product different work products — to varying degrees. Thus, the traits listed in the rubric might apply to some assignments better than to others.

  • When rubrics are reused over time or simultaneously across grades, they may be used for different assignments and that produce different work products — to varying degrees. Thus, the performance descriptions of the traits listed in the rubric might apply to some assignments better than to others.

  • When rubrics are reused over time or simultaneously across grades, students may grow beyond performance levels described in the rubric (i.e., a ceiling effect). While this can help to show proficiency or mastery, it comes at the cost of providing direction to teachers or students about where they might best focus their efforts, next.

  • Alternatively, avoiding the ceiling effect might come at the expense of the earlier uses of the rubric (and/or use at lower grades). That is, students may fall short of any of the performance descriptors in the rubrics (i.e., floor effect), thus failing to earn any credit for the work they have done.

These issues are not limited to classroom use of rubrics. They also can occur when rubrics are used to score standardized assessments. While the benefits or rubrics are a bit different in these contexts, they are no less valuable, overall. In fact, rubrics are quite necessary in scoring standardized assessments. Clearly, communications to educators about the rubrics used to score student/test taker work product can itself be quite valuable, helping to align instruction and assessment. However, rubrics are necessary to score constructed response items because they are the mechanism for consistent scoring across the entire test population.

Of course, the obvious way to address these issues is to anticipate them and simply write better rubrics in the first place. Duh! Those among us who can see the future clearly have an enormous advantage in doing this well. But even that special group sometimes must use a rubric that was handed to them, fait accompli. Furthermore, making rubrics generalized enough to handle a broader range of assignments and work product can lead to vague rubrics that are difficult to use. Rubrics that can be used across longer spans of time — even years — can have unfortunate or inconsistent effects on how those scores are weighted relative to other scores, due to ceiling and floor effects. This can particularly be problem when rubrics are used simultaneously across grades.

One might simply be careful to construct new new assignments that match those previously scored with the rubric, but this limits flexibility to the point that it can even stymie the kind of continuous improvement the we all want our teachers, schools and tests to engage in — even when it is possible. Often, we want (or need) the benefits of a consistent rubric, even as the nature of the students and their work are developing over time. When rubrics are used simultaneously across grades, it is often simply necessary to for tasks to be different at different grade levels. These sorts of flexibility in rubric use require the rubrics to be a bit vaguer or more generic, so that they might apply to these broader uses. But that lesser specificity makes it harder to use the rubric well or consistently.

This leads to the necessity of supplementing rubrics with additional materials, so that they can be used consistently. This is particularly important when multiple teachers or scorers are using the same same rubrics.

  • Exemplars. The most common supplementary material is a group of exemplars. That is, authentic or simulated student/test taker work products that demonstrate what different performance levels look like for this particular task. Teachers and other scorers gain better — and more consistent — understanding of the rubric in this context by examining it in light of those exemplars.

  • Interpretative Guidance. More narrative or explanatory support can help teachers and other scorers to understand the thinking and/or appropriate shifts in thinking required to apply the rubric to different level of students learning (and/or development) or a new context. This is especially important when rubrics are reused across longer periods of student learning (e.g., years) or a new context or simultaneously across grades.

In the absence of supplementary materials, teachers and other scorers are often left to their own devices, and can respond inconsistently. Some may simply accept extraordinarily low or high scorers across the board. Some may simply ignore the traits/dimensions that seem less relevant. And some may substitute their own judgment (e.g., new traits) for what strikes them as an inappropriate rubrics — entirely undermining the rubric, itself. While some of these outcomes might seem acceptable, the fact that — without supplementary materials — they will happen unpredictably violates the fundamental purposes of rubric use.

In classroom use, this can demoralize students and/or leave them without direction. It can do the same for their teachers, who so often are invested in their students’ engagement in their classrooms. In the context standardized assessment, it can prevent the kind of scoring consistency that is necessary for standardized test scores to have any reliable meaning.