Constitional Reasoning

I try very hard to understand the views of those I disagree with. I really look to understand the values, reasoning and priorities of my opponents and rivals. I particularly try to think about whether what I might view as a compromise, they might view as offending them in exactly the same ways as the original.

Now, I admit that one big reason i do this is that I was taught from a very early age that the best way to win is to know the other side’s arguments better than they do; I was raised by an attorney. But nonetheless, I look for what they think and to look for inconsistencies. I look for holes. I look for bullshit at key points.

Now, one would have to be a bit of a legal nerd to know that, no, the Bill of Rights originally only provided protection from the federal government. Individual states were free to violate all of those rights, unless they themselves offered similar protections. (Off the top of my head, I believe that the US Bill of Rights was based on Virginia’s Bill of Rights.) And one would have to be a bit of a legal nerd to know that it was the Civil War Amendments that incorporated the federal Bill of Rights, making them apply to each state, as well.

For this reason and others, the Civil War and Reconstruction are often referred to as our nation’s Second Founding.

It is therefore quite suspect when anyone claims that the laws and norms of the late 19th century are insufficient historical grounding for making sense of the meaning of the Bill of Rights. If the question is what states may or must do, to go back 50-100 years further — when the Bill of Rights did not apply to the states — is willful blindness. It is intellectually insensible (that a typo, but it works for me!). It is an exercise if motivated reasoning that is — to use a legal term — risible.

I’ve never understood the idea that some of the Bill of Rights could be incorporated, but not others. I can follow sensible reasoning that says that the Second Amendment provides protection against state governments, and that the therefore state National Guards and the like cannot serve that “well regulated militia” function mentioned in the amendment. I can see a sensible line of reasoning that says the amendment protects an individual right to bear arms. But that individual right is not at all for individual protection. Rather, the amendment makes explicitly clear that this right is to bear arms for collective action and community protection. To suggest an individual right to individual protection is…entirely ungrounded in the text or tradition. It is, once again to use a legal term, risible.

That means laughable. It means so lacking in sense and reason to just be laughable. It should be laughed out the room, out of the courtroom.

New York State Rifle & Pistol Association Inc. v. Bruen is not the case that will bother me most from this Supreme Court term. I do not think that it is the case that will do the most damage to our society. It is not even the case that offends me the most, even though Kennedy and Dobbs has yet to be announced. But it might be the most intellectual dishonest case from this term. I do not simply mean misguided or confused. I mean flat out dishonest.

Is There Anything More Important Than Trust?

Trust has always been a huge part of my practice. As an educator, as a leader, as someone who thinks about learning and leadership development, trust is a mainline in my thinking.

I probably learned this from my doctoral advisor, Prof. Ellie Drago-Severson. Her genuine trust and presence in the room — as a teacher and as a staff developer — creates trust like nothing I have ever seen. So much of her teaching depends on learners admitting vulnerability and mistakes, and this is only possible of there is trust in the group.

This is not to say that I did not think trust was important before I met Ellie. Rather, my many years of work with her and under her direction raised the importance of trust, in my thinking.

Like Ellie, the technical skills that I lay out and teach really just serve as examples or exemplars of deep values and ideas in practice. These are techniques that how frameworks of thinking can be used and lay out what that would look like. This means that I am really trying influence the thinking of those I am working with. I am hoping to plant ideas deeply and nurture them into influence on how they work — on how they think about their work.

Now, some people are more open to this kind of deep learning and some are more resistant. Ellie’s great gift was her ability to move the resistant towards being more open. I know that I originally held a lot of her ideas at arm’s length, but between their brilliance and her own brilliant ability to build trust, I came to appreciate them deeply.

I’ve been thinking particularly about trust and trust-building this week. I am always concerns about these things in my teaching and coaching. This week, though, I am thinking about how trust is built by leaders. Managers and direct supervisors can — and should — work on building trust through their direct relationships with their team members. However, many leaders are not primarily direct supervisors. On larger teams — an in whole organizations — the leaders have rather little direct contact with most of the people they lead. However they might have built trust with those they worked with more closely in the past, they need to find new ways to build trust from a larger group who will never have that kind sustained direct contact with them.

So, that is what I am thinking about today: How does a leader build trust with people without depending on the direct interpersonal relationship?

 

So many cognitive Paths…

I was just looking at simple math problem with a group of people and we came up with four different ways to solve this multiple choice item. This is not the actual item, but it was ver similar to this:

Jeremy goes to the store to buy a jar of peanut butter and a jar of jelly. If he starts with $15.85, the peanut butter costs $2.95 and the jell costs $3.65, how much money does he have when he leaves the store with his purchases?

This really was a very simple item. Sure, the items (and the starting amount) were expressed in both dollars and cents. But they were all multiple of five.

Pretty simple, right? But when we each went though the item ourselves, we naturally and authentically did it in four different ways.

  • One of us was lazy, and just rounded all the numbers and did the math in their head. They then added the two purchases (~$7) and subtracted that from the original ~$16, leaving around ~$9. Only one answer option was around $9, so they picked that.

  • Another one of use did the straightforward two step math problems on a piece of paper. $15.85 - $2.95 = $12.90, and then $12.90 - $3.65 = $9.25.

  • A third person tried to do the two step math problem, but in their head. They doubted their mental math skills, so they pulled out a piece of paper to double check — and it was good that they did. They ended up with $9.25.

  • The last person stuck with addition. They added the two items to get $6.60, and added that to each answer option until they found an answer option that yielded a total of $15.85.

Four very capable and very education professionals addressed a single simple math problem, but had four different strategies. And that was for a VERY simple problem.

Imagine how many different cognitive paths more complex problems might prompt. Imagine how many different ways including more context for a problem could lead test takers to different obstacles, distractions and even mistakes. To help with that, we have put together a set of a few dozen personas to try to think about as hypothetical test takers who might approach an item differently than you.

The Importance of Humility in the Work

Way way back in the day, when my collaborator and I started or Rigorous Test Development project, we came up with six core principles that we thought were essential to the work. Though we might phrase things differently today (e.g., three of those core principles can be found in our mantra, valid items elicit evidence of the targeted cognition for the range of typical test takers), we really cannot argue with our final core principle at all: Test development requires approaching the work with humility.

We have said “approaching the work with humility” and “engaging with humility” and all kinds of minor difference in phrasing, but this idea of humility remains central to the RTD approach. It’s the start pointing for so much — for collaboration, for individual learning, for organizational learning. It’s one of our original six principles — born of her observations about the major stumbling blocks to creating higher quality assessments.

Humility in the work begins with recognizing the limits of one’s expertise. That includes:

  • Recognizing that your own expertise DOES have limits.

  • Recognizing the expertise of others.

  • Recognizing the the areas of your own expertise — perhaps even fairly fine grained — both relative to each other and relative to the level of expertise of others.

  • Speaking from your expertise and listening when beyond your areas of expertise.

  • Recognizing the difference between interest and expertise.

Now, if one wishes to expand one’s areas of expertise, that’s awesome. But it does not happen simply because one wants it to.

  • Listen when those with expertise you desire speak and think hard about what is behind what they said (value, goals, priorities, knowledge, principles, etc..)

  • Ask questions of those with expertise — both to get them to be more explicit and to signal that you want to better understand what they are saying. (e.g., “Does that mean…”))

  • When one ventures to speak beyond one’s present expertise, own it explicitly (eg.g., “I’m not sure about this…” or “Maybe…”)

Now, there is CLEARLY a gender component here. Men often feel more free to speak outside of their expertise. Men often disregard women’s actual expertise (ie. thus, "mansplaining" is actually a very meaningful term). And too often, women have learned to downplay their own expertise or even to be unaware of it. I try hard to combat all three of these — both in myself and in the women I work with. But we are all products of our American culture, so that gender crap is going to to be present too often — including in me. This is part of why I encourage people I work with — including women — to interrupt me. I know that some people really have trouble with outspoken women, but I really hope that I am not one of them. I don’t think I am. But I am aware that I am a product of America culture, so there’s always that possibility.

I hope that I model questioning. I hope that I model encouraging others to speak. I hope I model elevating the voices and views of people I am working with. I know that I learn a ton by working with others that have expertise that I lack. For example, I have been working with the Next Generation Science Standards quite a bit this year, and my knowledge of NGSS has deepened (and I am seeing the more subtle issues within NGSS more clearly) by working with people who have far greater NGSS expertise than me. We are building stuff together, and I know that I am incredibly dependent upon them as a partners to help me to become more expert here — dependent upon them in ways that they likely don’t appreciate. Yeah, I’m still me. I still talk too much. But I run my thinking by them, because I am looking for correction and redirection. I cannot be confident in what I am saying unless I expose it to them for criticism. I know that I’ve got expertise and skills that they lack, and I also know that they have expertise, skills and experience that **I** lack. Thus, we are learning from each other and approach our work together with that stance/goal in mind.

So, yes, there is a place for arrogance in the work. (If there weren’t, how could I do it?) But even that can done with humility.

So, one aspect of approaching the work with humility is maintaining an open-minded learning stance — even if simultaneously holds a critical stance. This can be reconciles by leading with questions and actually listening to other people’s answers. Of course, one must be sure to apply that critical stance to one’s own ideas and contributions.

When people do not approach this work — or, likely, any work — with humility, the loudest and most confidence voices dominate, rather than the most knowledgable. Issues or objections get prioritized over each other, rather than reconciled in some best (or least bad) compromise. Often, the whole remains far less than sum of its parts.

I cannot say that humility ever came naturally to me. My own entry point into this principle — and my most reliable reminder of this stance — is my interest in learning from others. I am always looking for what I can learn from others and try to invite opportunities to do so. No, that is not the same thing as approaching the work with humility, but it is the easiest facet to me. The rest, I more consciously work on.

Terrorizing School Boards

There has been some hullabaloo around the National School Board Association’s (NSBA) recent letter to the Biden administration about protecting school boards and school board members from threats of violence. I have found the backlash against this to be so unreasonable as to be clearly bad faith.

The question at and is whether NSBA’s use of the term “domestic terrorism” was over the line, was inappropriate and/or perhaps entirely misleading.

First, let us just agree that terrorism is not just about Muslim extremists. It is not not just about events that happen in far away lands. Terrorism is the use of violence — and even credible threats of violence — to achieve political ends. The idea that violence can be a form of addressing political concerns is not a new one. Henry Kissinger spoke of "war [as] a continuation of political activity by other means,” an idea rightly credited to Karl von Clausewitz. Today, when it is asymmetric warfare, we generally call it terrorism.

While there have been many heated — though still non-violent — debates at school board meetings this year, there have also been many threats of violence, too. The fact that most disagreements — and even yelling, shouting and interrupting meetings — do not constitute threats of (or actual) violence does not at all undermine that fact that such things have happened.

Some people have made those who agree with them look bad, as most do not resort to violence or threats of violence. But blaming those who cite reality for being inflammatory or dishonest simply for citing reality cannot be taken as good faith objections.

I don’t need to address the question of what to call intentional efforts to disrupt school board meetings to recognize that some are resorting to violence (and threats of violence) to achieve political ends that they have not been able to further at the ballot box.

That is domestic terrorism. And there is nothing wrong with calling it out as such.

The Most Common Mistaken Approach to Item Alignment

The biggest mistake that people make when thinking about item alignment is focusing on the charged task, rather than thinking about the clarity of the observable evidence that an item generates or about the cognitive path that test takers might follow. 

True item alignment is about quality of the evidence that the item generates. Does the item present strong evidence that the successful tester does indeed have proficiency with the Targeted Cognition (i.e., the KSAs revealed by a close reading of the standard)? Does the item present strong evidence that the unsuccessful test that lacks proficiency with the Targeted Cognition? 

Mistaken thinking about item alignment tends to reduces those questions mere to questions of relevance – rather than focusing on the clarity of the evidence. That is, some look at what test takers are asked to do or think about (i.e., the task), and consider whether it is relevant to the standard, or how relevant. From this perspective, an item that suggest tasks which depend on cognition that is closer to what is described in the standard and/or that are more dependent on that cognition are deemed to be more strongly aligned. However, items that merely make use of cognition that is merely related the what is described in the standard cannot provide clear affirmative and/or negative evidence of test takers’ proficiency with the Targeted Cognition. Simply put, it is not enough.

This mistaken approach often accepts items that test takers get wrong for reasons other than the Targeted Cognition as aligned. This mistaken approach often accepts items to which that test takers can respond correctly without the Targeted Cognition as aligned. It allows for the mere possibility that the test taker used the Targeted Cognition or misapplied the Targeted Cognition to count as alignment, in spite of the ambiguity of the such evidence.

(With multiple choice items, quite often, this is exacerbated by focusing only on whether the Key is accurate, whether the distractions are incorrect and/or – at best – whether they capture the kinds mistakes that test takers might make. However, these mistakes do not have to be mistakes in or misunderstandings of the cognition described in standard, itself. Thus, these items frequently max out at level 3 (i.e., Task Alignment).)

The RTD Alignment Scale addresses this mistake, head on.

 

RTD True Mastery Typology

This typology is not about performance levels that one might report from an assessment. Rather, it is a more theoretical framework about the true range of proficiency that people truly possess. Assessment developers and consumers should be careful to be mindful of the kinds of proficiency claims that assessments can support and the various costs (e.g., financial, testing time) of generating sufficient evidence to support  

0: No Proficiency

The student has no ability with or understanding of the standard, KSA or Targeted Cognition.

1: Emerging Proficiency

The student has partial ability with or understanding of the standard, KSA or Targeted Cognition.

2: On Demand Proficiency

The student exhibits ability with or understanding of the standard, KSA or Targeted Cognition when prompted to do so, but does not know when it is appropriate to do so without such prompting.  That is, the student does not realize on their own when the standard, KSA or Targeted Cognition would be useful in solving a ill-structured or complex.

3: Elective Mastery

The student exhibits ability with or understanding of the standard, KSA or Targeted Cognition on their own, but does not always remember to check whether the standard, KSA or Targeted Cognition is appropriate. That is, the student often has to consciously remember to apply the KSA or Targeted Cognition, but is able to so do so at those times.

4: True Mastery

The student habitually and even unconsciously applies their understanding of the standard, KSA or Targeted Cognition as appropriate as needed, and in conjunction with other KSAs. 

 

On Demand Assessment of Mastery

It is difficult for standardized assessment to consider Level 3 (Elective Mastery) or Level 4 (True Mastery) because standardized assessments are generally on demand assessments. They usually signal quite strongly what KSAs are needed to respond to an item. Various forms of constructed response items can make it more appropriate to infer Elective Mastery, based on test results. However, the frequent heightened stakes of standardized assessment – however real or however imagined – may sufficiently focus a student beyond their usual habits that True Mastery cannot be safely inferred. 

The RTD Alignment Scale

The RTD Alignment Scale presents five discrete levels of alignment and is applicable to the full range of item types and classroom activities. Because valid items elicit evidence of the Targeted Cognition for the range of typical test takers, this scale focuses on the quality of the evidence that the item provides. It considers whether test takers who have produced the desired work product have produced strong observable evidence of proficiency with the Targeted Cognition and whether test takers who have fallen short of providing the desired work product have produced strong evidence of a lack of proficiency with the Targeted Cognition. That is, it is very mindful of the twin problems false positive and false negative evidence. 

0: No Alignment

The item is not at all aligned with the Targeted Cognition (i.e., the KSAs revealed by a close reading of the standard). There is no way to use the Targeted Cognition to help solve the item, work from the stem to the key or otherwise provide the desired work product. 

There is no information at all about proficiency with the Targeted Cognition from successful or unsuccessful test takers. 

1: Optional-Use Alignment

The item demonstrates mere Optional-Use Alignment when use of the Targeted Cognition to generate the desired evidence is optional for the test taker. There is a way to use the Targeted Cognition to get help solve the item, work from the stem to the key or otherwise provide the desired work product, but it is not the only path. The test taker could also solve the item, work from the stem to the key or otherwise provide the desired work withoutusing the Targeted Cognition. 

Therefore, it cannot be clear that even successful test takers have any proficiency with the standard and unsuccessful test takers provide no information about a lack of proficiency with the Targeted Cognition. That is noise from the alternate paths through the item masks any signal that some test takers might generate from the Intended Task. This is an extraordinarily weak form of alignment.

2: Non-Dominant Alignment

The item demonstrates Non-Dominant Alignment when the Targeted Cognition is the not the pivotal step of the cognitive path that solves the item, gets from the stem to the key or otherwise provides the desired work product. Instead, misunderstanding or misapplication of other cognition is likely to be the barrier that prevents test takers from successfully completing the item. To demonstrate Non-Dominant Alignment, the task must depend upon other cognition which is not appropriate to take for granted among the range of typical test takers for this test. 

Successful test takers do provide some information about their proficiency with the standard, but unsuccessful test takers do not. Thus, this a weak form of alignment.

3. Task Alignment

The item demonstrates Task Alignment when -- although the cognitive path to a successful response doesdepend appropriately on the Targeted Cognition – the evidence that that item captures is not necessarily that of misunderstandings or misapplications of the Targeted Cognition. Instead, it conflates evidence of othermisunderstandings or misapplications with appropriate evidence regarding the Targeted Cognition.

Like Non-Dominantly Aligned items, there is information from successful test takers, but because the item captures evidence of mistakes that are not a results of mistakes with or misunderstanding of the Targeted Cognition, the evidence it provides of lack of proficiency is quite weak.

4. Item Alignment

The item is fully aligned to the Targeted Cognition. 

Successful test takers provide strong observable evidence that they have proficiency with the Targeted Cognition and unsuccessful test takers provide strong observable evidence that they lack proficiency with the Targeted Cognition.

Women Are Fascinating People

There are lots of horrible things about Texas’s anti-abortion law (SB 8).

And there are still more horrible things in the disingenuous ways that the Fifth Circuit Court of Appeals and the Supreme Court of the United States have responded to efforts to adjudicate on its constitutionality. I am not going to get into all of that.

I am not even going to get into what it does to abortion access.

I am simply going to point to smaller impact that SB 8 will have on millions of women as they live their lives. Not the pregnant ones. Not the ones who know they are pregnant, but on all the *other* women.

As a man, let me tell you that women’s menstrual cycles vary. Some women have longer cycles and some of them have shorter cycles. Some women’s cycles run like clockwork, and some women’s cycles vary. And, as best I can tell, every woman knows this. So, the idea that a woman’s period is “late” is not really a binary. It’s not like being late for a 10am meeting. It might just be a longer cycle that month.

So, how late does it have to be to be, you know, *late*? How late does it have to be to really wonder? How late does it have to be worth getting a pregnancy test?

Now, some women pay *very* close attention to their cycle, and some women do not. Obviously, that is their right. I mean, there are plenty of things that I should pay attention to because I really should do something on a regular basis, but I kinda wing it. I kinda pay attention, and I notice eventually — usually pretty soon — but I do not spend a lot of time worrying about it.

SB 8 tells women that they cannot take have that kind of attitude about their reproductive systems.

The so-called six-week threshold is not really six weeks of pregnancy. The law defines the start date NOT as the moment of conception, but rather as the last day of the woman’s last period. But conception actually occurs days — weeks, really — later. This means that a woman cannot wait to miss a second period. It means that a woman with a longer cycle (or who sometimes has a longer cycle) must be very vigilant about being late. Must notice. Must get a test.

Because, if she MIGHT want to get an abortion, she is going to need time to find a day she can take the time (from work? from child care?), find an available appointment, put together the cash and maybe even find a friend to be on call (maybe to take her home? maybe in case her body responds poorly? maybe in case she just needs to feel supported). If she has an appropriate partner or spouse, they need schedule around their availability, too.

That’s not a six week window in which to schedule. That’s not a four week window. That might not even be a three week window. Two weeks? One week?

What does this mean? This means that women of reproductive age (14-55?) must be ever vigilant about their reproductive systems. Must never forget to be fucking ON IT. Otherwise, they might run out of time. That window can easily get so small that women with more average cycles need to be one it, too. All women need to be on it.

And if they and some friend so much as discuss a willingness to lie about the date of the end of their last period? If they as much as discuss taking her to airport to get to an abortion services provider? They are all subject to these law suits, their $10,000 fines (plus attorney costs), I think.

So, SB 8 forces all of these millions of women of reproductive age to be as obsessed with their reproductive systems as these crazies (who do not understand the difference between religion and science, who do not understand even what “cell differentiation” means) are obsessed with these women’s reproductive systems.

Furthermore, SB 8 entices everyone else to pay very close attention to the women’s reproductive status, because the most interesting ones — the one worth $10,000 — are the one who are just newly and almost undetectably pregnant and maybe haven’t even noticed yet themselves. The most interesting thing about these women becomes — for far far far far far far far too many people — is now whether or not they are pregnant.

Putting hyperbole aside — and I have tried to write this whole post without hyperbole — SB 8 imposes on all women a new value system that for themselves and for those around them, their current reproductive status is the most important thing about them.

Not their children. Not their marriages. Not their careers. Not their relationships. Not their good works. Not their faith. Not their own values. Not their experience, character, abilities, contributions, dysfunctions, mistakes, victories, appearance, intelligence, stories, taste, accomplishments, disappointments, hair, vocabulary, favorite movie, that thing she does, sense of humor, style, obsessions, political leanings, influence, or the way she makes people feel.

None of that.

The most important thing about any women — or girl — of reproductive age in Texas is her reproductive status. That is the impact of this law, right now. I guess that's *not* a small change, now is it?

Under his eye.

Where are The Standards in The Next Generation Science Standards?

The Next Generation Science Standards is one of those gargantuan things that is just amazingly impressive. Awesome even. Science is so many things. It is an approach and it is what that approach has taught us. There’s this single idea (i.e., science), and the different disciplines within science. 

Trying to organize all that mess into a single anything is just incredible. I often come across what ECD (Evidence Centered Design) calls a domain model and am flabbergasted that anyone was able to do it.

Wow.

The thing is, NGSS is so commonly misunderstood. It was an effort to organize a domain – or set of domains – both to establish standards and to create supports for educators to help student reach those standards. Clearly, it is influenced by ECD – and it even uses some ECD terminology. But unlike ECD, it is at least as focused on supporting curriculum development and instruction as it is on supporting assessment. 

The most common misunderstanding is in which part of NGSS constitutes the actual standards. There are the PEs (Performance Expectations), the CCCs (Cross Cutting Concepts), the DCIs (Disciplinary Core Ideas) and the SEPs (Science and Engineering Practices). Now, my favorites are the SEPs, but that does not make them the standards. 

What Does NGSS Say?

Luckily, the official website of NGSS, <nextgenscience.org> has a page for Understanding the Standards. It explains exactly what is going on. 

The second paragraph explains that the three dimensions (i.e., CCCs, DCIs and the SEPs) predate NGSS, and that they were “introduced in the National Research Council's A Framework for K-12 Science Education.” Clearly, the standards part of NGSS is not found in those three dimensions that NGSS inherited from The Framework. Rather, the NGSS standards would have to be found in the parts that the NGSS coalition wrote later — albeit inspired by NRC’s Framework.

A little further down the page is a video which explains further. At around the one minute mark, the narrator says, “The standards have been developed as student performance expectations, which are statements of what students should know and be able to do by the end of instruction.” In case the text on the page was not clear, the video is rather definitive. The standards are the Performance Expectations. 

So, What’s the Problem, Then?

The PEs are built from the three dimensions, but the PEs have far greater specificity than anything in any of the three dimensions. For example, HS-PS1-1. (“Use the periodic table as a model to predict the relative properties of elements based on the patterns of electrons in the outermost energy level of atoms.”) does not say anything about protons, atomic number or atomic mass. In fact, none of the PEs mention atomic number or atomic mass in the context of the periodic table of the elements. (Only two PEs even mention the periodic table, at all.)

Does this mean that the periodic table is not important? Does this mean that those parts of the periodic table are not important?

No. That is not what that means. 

Instead, if forces us to confront the relationship between learning goals (or outcomes) and learning pathways. It certainly makes us think harder about the differences between instruction and assessment. 

No one would ever suggest that atomic number and atomic mass are unimportant concepts. No one would ever suggest that their place in the periodic table is unimportant either. (Frankly, the periodic table of the elements in another one of those awesome works of genius.) If you care about those aspects of the periodic table, no one is arguing with you. You can teach that, and that stuff is important to understand on the way to being able to meet the performance expectation. 

But understanding – or demonstrating understanding – of how the periodic table of the elements is built upon atomic number and can be used to look up atomic mass is not “what students should know and be able to do by the end of instruction.” Precursor? Yes. Is it allowed? Of course! 

Is it required? No. No, it is not. 

Oh, It Hurts!

Yes. It hurts. There are things in science that I love (e.g., Do you understand how Dmitri Mendeleev arranged the periodic table by atomic number AND electron levels? It’s amazing!) which are not a part of the Performance Expectations. How can that not be in the standards!? I love that stuff!

There are things all across science (particularly things found in the DCIs, but also things in my favorite dimension (the SEPs) that did not make it into the PEs. Important things. 

Let me be abundantly clear: Some of my favorite ideas and fact from science are not in the Next Generation Science Standards Performance Expectations. 

But let me be equally clear: I do not have the right to say what the standards are. That is up to the standards writing bodies and the state legislatures that endorse them (or edit and then endorse them). The fact that I think that something is important does not mean that I get to make it a part of the standards. The fact that I can make a compelling case for important it is and for how many others think that it is important does not give me that authority, and it does not overwhelm the stronger cases that it is not in the standards.

Yeah, that fact hurts, too. Being both technically and morally right does not give me authority over our democratic institutions to decide what is taught in all of our schools and/or should be on our official assessments. That fact hurts me every day.

No, No, You’re Wrong Because I’ve Read…

You’ve read the NGSS Structure: How to Read the Next Generation Science Standards document, linked to on that same page? You've found where it has said stuff like, “[The DCIs are] the most essential ideas in the major science disciplines that all students should understand during 13 years of school.” Yeah. It does say that. But depending on that kind of sentiment misses the nature and intention of NGSS – intention that is made explicitly clear in that same document.

NGSS is clear about what it means by standards. Its authors were quite aware that science is a broad umbrella and not every student will – or even could – learn all of it. NGSS posits that the standards should be the part of science knowledge (and approaches) that all students should learn. They wrote NGSS’s PEs as standards, “to ensure that this set of PEs is achievable at some reasonable level of proficiency by the vast majority of students.” The PEs should be the standard part – the baseline – for all students. 

NGSS is also clear that these PEs (as standards) are a floor, not a ceiling. “A second essential point is that the NGSS performance expectations should not limit the curriculum.” Schools are free to teach more than the standards. Even individual teacher can – and do! – bring in their own favorite ideas, applications and activities. Curriculum writers are free to go beyond the NGSS standards. In many cases, they should. 

In fact, the PEs are likely not sufficient to fill an entire curriculum. In fact, there likely should be much more taught than the PEs. But NGSS says that the PEs are the most important parts, and they have to be taught. The rest? Well, different students in different classrooms, schools, districts and states can be taught from the rest in various combinations of ways. 

But the NGSS performance expectations are the standard part. They are the standards.

But NGSS Says That Some States Do Include More

Yes, the authors of NGSS are quite aware that they do not control the states. They are quite aware that states can adapt and modify others’ work before adopting them as the official state standards. “Other states also include the content of the three foundation boxes and connections to be included in ‘the standard [sic].’” The primacy of our democratic institutions to make such decisions is simply a fact, and NGSS acknowledges that fact. 

But if you are going to depend on that idea to suggest that NGSS does not get to say what the standards are then you simply have to accept that each state gets to decide. That still takes you and me out of the equation. If a state has said that the PEs are the standards, then the PEs are the standards, and it doesn’t matter what you or I prefer, or what NGSS’s authors intended. And if a state says that it is more than the PEs, then it doesn’t matter what you or I prefer, or what NGSS’s authors intended. (Yeah, in those cases, this blog post and our disagree is simply moot.)

What Does This Mean for Assessment?

There is a deep philosophical issue at play here.

It is easy to answer that question when we are talking about classroom assessment. Classroom assessment should assess whatever is taught in that classroom. No question. Classroom assessment should be aligned with instruction – the full breath of instruction. (Or maybe what the school and/or district has decided should be the full breadth of instruction.)

But our big formal standardized assessment? That poses a different set of issues. Should our assessment aim to measure everything that could be taught, and thereby exist as this standard against which to measure ourselves against our greatest curricular and/or learning aspirations? When I took geometry at this weird program, the final exam was out of 200 points, but we only needed like 60 points to pass. The exam covered everything, but we just needed to demonstrate knowledge and skills in enough areas to show we deserved to move on. Not everything. Just enough. 

That is not how assessment generally works in America. That is why that experience still stands out in my memory, so many decades later. In our assessments, we want every student to get every point. Our aspirations for assessment is that the test takers top out. We award our highest marks only for students who approach 100%. And there is too much science to expect that every student has a chance of doing that. We do not even require students to take all the science classes. Sure, some high school students may take biology, chemistry, physics, AP chemistry, AP physics and astronomy. But even those kids didn’t take AP biology or earth science. Staying at the high school level, how many states or school districts require even four years of science?

How could it be fair to give assessment on content that students have not had the opportunity to learn? How could be fair to give assessments on content that teachers were not on notice that their “students should…be able to do by the end of instruction”? How is that fair to students, to educators, to districts and/or anyone else who might face consequences for student performance on these assessments?

I can imagine a world that has NGSS-aligned assessments that address all of the Performance Expectations and go on to sample from other content. But I do not know how decisions about which of that other content should be sampled should be made, which raises those challenging opportunity to learn and notice to teachquestions. Even putting those concerns aside, though, we would have to make sure we are doing a truly excellent job on assessing the PEs before we even begin to think about assessing anything else. 

 

 

Standards: Instruction vs. Assessment

I am a sucker for alignment. As much as I try to keep the human element in mind, and as much as I love creative and divergent lessons, I am deeply attracted to rational alignment. Good policy that actually supports good practice, and practice that aligns with policy? Man, I love that! I want the right hand to know what the left hand is doing, not to be working at cross purposes and even both hands working to support each other.

Educational standards are an attempt to create alignment. We want all students to be working towards the same learning goals, regardless of what district they live in or what teacher they were assigned to. We want the combat the soft bigotry of low expectations. And we want to bring the best thinking about what is possible and what is advisable to inform what our schools do for all of our students. We write and adopt standards to guide instruction.

We also look to standards to guide assessment. We want our assessments to be aligned to instruction and we accomplish that by aligning assessment to the same standards as instruction.

 
29DC919F-7A38-466E-84B6-036BEAF140CC_1_105_c.jpeg
 

The thing is, as much as I like rational alignment, when standards inform instruction they should be understood quite differently than than they inform assessment. Instruction should is guided by standards, but not hamstrung. There are many other factors that inform instruction. On the other hand, assessment really should be much more constrained by standards.

In the next few posts, I will explore those differences. I will explain how the best instruction is tied to and grounded in the standards, but also builds beyond them. And I will explore how and why standardized assessments must focus more narrowly on more limited conceptions of the standards.

How Important is Reliability?

2016’s The Standards for Educational and Psychological Testing say that that validity is “the most fundamental consideration in developing tests and evaluating tests.” This is the second sentence of the first chapter of that book. The 1999 edition said the same thing, without repeating the word “tests.” The 1985 edition agreed, but back then it was the first sentence.

Validity is the alpha and the omega. it is everything.

So, where does that leave reliablity?

The Cliché

Last week, I ran through the cliché explanation of reliability of validity with the metaphor of a target. My rude punchline was that psychometrics — being concerned with metrics (i.e., numbers) — has nothing to offer us about validity.

 
Figure A. Reliability and Validity

Figure A. Reliability and Validity

 
Figure B. Psychometric View of Reliability and Validity

Figure B. Psychometric View of Reliability and Validity

 

Because psychometrics has no way to think about validity, it doesn’t have a target at all. Rather, it just looks at how tightly clustered the hits are.

(I know about internal structure and convergent/discriminant evidence. Those are still about reliably. The latter is about reliability with other measures, but it begs the question of whether the other measures are valid. Yes, correlation with various outcomes might offer something, but that a topic for its own post.)

Generally, psychometrics has no theory, idea or vision of validity, so it raises reliability to be the more important consideration. But reliability is not the alpha and omega. It is a false god.

The Psychometric Defense

The smartest explanation for the importance of reliability that I have ever heard is that it is the upper bound or upper limit on validity. That is, in language of the cliché, you cannot hit the bullseye consistently if you cannot be consistent. You cannot hit anything consistently if you cannot be consistent.

My basic response to that is that I do not care how consistent you are if you are not near the target.

So, here’s the real question: which is better?

 
Figure C. The Worst?

Figure C. The Worst?

 
Figure D. The Worst?

Figure D. The Worst?

 

I acknowledge that they are both pretty damn lousy. But those who prize reliability would prefer Figure D because it is — at least — reliable. I look at Figure D and am quite sure that does not measure anything that I care about. It’s not noisy; it is just wrong.

Figure C is noisy. There are real problems. It is a lousy and unreliably measure. But at least there is some signal of what I am looking for in that noise. Sure, the confidence intervals are huge, but there is information of value in there.

Putting it in very concrete terms: I do not need another test of socioeconomic status. No one needs another test of socioeconomic status.

The Problem of Prioritizing Reliability as the Upper Bound on Validity

From what I have seen and read, the idea that reliability is the upper bound on validity has morphed into the idea that we increase validity by increasing reliability. And therefore, we can stop worrying about increasing validity because we can just focus on increasing reliability.

There are people who confuse reliability and validity. There are people who say “reliability” when they clearly ought to mean validity, but the difference is simply not important enough to them to realize that they have made a mistake.

When the means becomes the ends, what had been valuable actually comes the obstacle.

Concerns with Reliability as Obstacle to Validity

There are many causes for the quality problems with our big standardized tests. In my view, the greatest problem is that we are stuck in a vicious cycle in which perceptions by educators and the public of low quality (i.e., lack of validity) limit willingness to spend money for testing and to devote student time to testing. This harms the quality of our tests and…well…repeat.

But this is not the only problem.

Figure E.

Figure E.

Figure F.

Figure F.

Figure G.

Figure G.

The problem is that those who look to reliability as the most important consideration see moving from Figure F to Figure G as an improvement, and see moving from Figure F to Figure E as a decline. Many are always unwilling to give up reliability in order gain in validity.

There are item types that constraint reliability, ether because they take so much resources that tests must rely on fewer item or because they cannot be scores as reliably. And those item types are incredibly disfavored. Items types that simply cannot get to real cognition behind standards are not disfavored. Instead, we get highly reliable items that too often fall short of the actual targeted cognition.

How Does This Keep Happening?

Psyshometricians — with their emphasis on reliability — are high status. They have graduate degrees in measurement — perhaps even PhDs. Content development professionals (CDPs) are merely former teachers, with a ll the low status that that carries.

This status difference often prevents from even being at the table, and when they are at the table they are often overruled. When they are not overruled, they are often intimidated into relenting.

And so, psychometric concerns drive assessment development far far far more than questions about whether items actually measure what they are purported to measure.

Which clearly, in my view, violates the Standards.

When Do Public Servants’ Ideology Pose a Threat of Tyranny to America?

My whole adult life – my whole professional life – I have read and heard complaints about teachers as being too liberal. The values that teacher expose pose a tyrannical threat to America because liberals pose a tyrannical threat to America. Teaching well established science is unacceptable. Teaching well established history is unacceptable. Teaching literature from subdominant perspectives is unacceptable. Multi-culturalism? Culturally relevant and/or responsive pedagogy? Unacceptable.

American colleges and universities? Too liberal. A threat to freedom of speech. A threat to liberty. 

 Public education has been under assault for…well, it has been under assault my whole lifetime. Certainly my whole professional life. 

Yesterday and today – January 6 and January 7, 2021 – I simply could not stop contrasting those complaints that I have heard and read for too many decades with what I saw on my television and saw online.

I do not mean the seditious insurrectionists who would overthrow democracy. 

I mean the uniformed and armed elements of our government who would overthrow democracy, who stand by for tyranny and are all too willing to crush liberty.

Understanding Tyranny and Liberty

These terms are so often misused that I feel it necessary to explain what I mean by then.

Our Founding Fathers were incredibly focused on tyranny. They signed onto the Declaration of Independence, which describes the tyranny with which they were concerned. Read it

  • They wrote of violence done against them (“He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people.”). 

  • They wrote of denying elected representation in government (“For imposing Taxes on us without our Consent,” and “For suspending our own Legislatures.”). 

  • They wrote of acting and interim appointments. Really. They did. (“He has made Judges dependent on his Will alone, for the tenure of their offices.”)

  • They wrote of refusing to be a leader of all the people (“Declaring us out of his Protection and waging War against us.”).

  • And yeah, I’ll mention it again, “He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people,” ‘cause the writing is  just that good.

It is pretty damn easy to see what tyranny is, if you actually read our founding documents. It is pretty easy to see what our government was set up to guard against. 

Similarly, it is not that difficult to understand what liberty means. It never meant freedom to do anything you want. It meant keeping religion out of government and government out religion. It meant freedom to assemble. It meant trial by a jury off one’s peers. Read the Bill of Rights https://www.archives.gov/founding-docs/bill-of-rights-transcript and not just your favorite one of those amendments. Keep reading beyond them to find out other forms of liberty that were most important to the giants of American history. 

They did not think that rules to protect public health were a threat to liberty. 

January 6, 2021

I would like the events of yesterday to be inexplicable. I would like to say that I was surprised. But they are not and I was not. I was infuriated, but not the least bit surprised. 

I certainly was not surprised by the civilians. And, frankly, I was not that surprised by the police.

The police let the seditious insurrectionists in. The police posed for selfies with them. And when some of them were ready to leave, the police gently escorted them out, even holding their hand to help them down the steps of the capitol.

I know it is cliché to mention it now, but contrast that with actions taken last summer against largely peaceful Black Lives Matter protesters around the country. With protesters against Brett Kavanaugh’s nomination to the Supreme Court. With disabled protesters against repealing the ACA. That’s the same city. The same steps. The same building.

Look around the country. The way the police treated protesters in Ferguson. The way they treated murderous armed militiamen in Kenosha. You know that I could go on and on.

The conspiracy to attack the Capitol to stop the constitutional processes that undergird the peaceful transition of power in the country was organized openly on the Internet. Members of Congress asked about security days ago. The Capitol Police then let the seditious violent insurrectionist into the Capitol. 

Were they overwhelmed? How could they possibly have been overwhelmed!? We saw just last summer that our governments’ armed forces know how to put down protesters, how to clear out protestors, how to protect themselves while aggressively suppressing thousands of people. 

The only truly surprisingly thing yesterday was the boldness of the claim that the governments’ armed forces and police were simply overwhelmed and could not stop the violent seditious insurrectionists. Even NPR used the word “unprecedented” this morning. 

But it was not unprecedented. We saw this in April in in Michigan.

There really are only two possibilities. Either the $400 million/year Capitol Police leadership’s incompetence rises to the level of the Bush Administration’s that lead to 9/11 – and thank god the violent seditious insurrectionists did not intend to kill all the members of congress who did not support the delusional fantasies of their megalomaniacal authoritarian strongman wannabe – or leadership of the Capitol Police did want to stop them. And the vast majority off of the law enforcement officers who support this this President – who never won a majority of the American electorate or had an approval rating above 50% – did not really want to stop them, either.

Law and Order

I choose to the more consistent option. I saw the armed and uniformed branches of our governments and their actions. I saw how they treat protesters on left and domestic terrorists on the right. I saw the violent armed seditious insurrectionists, including those who lionize the literally traitorous sedition to perpetuate slavery in America and lionize the avowed perpetrators of genocide. I saw that the armed and uniformed branches of our government stood by for hours as the violent seditious insurrectionists ransacked the capitol. 

I have read that each of the tiny number of people actually arrested yesterday was armed. How many more arms were there?  There were pipe bombs. Even as our members of congress were still in the building, these violent seditious traitors were allowed to do what they willed, treated with kid gloves and allowed to leave freely. 

The importance of respecting “law and order” has been a rhetorical cudgel used by the right since before I was born. It has been used against those protesting for equal rights, for access to the very things that our nation’s Founding Fathers demanded. And yesterday, there were these incredibly weak innovations of “law and order.” This time, they were not accompanied by the violence of our government, even though this time the supposed violators of law and order were, quite literally, violent seditious insurrectionists seeking to stop the peaceful handover of power, as dictated by our laws and our Constitution. 

It is quite clear what the greatest threat to liberty within our government and among our public servants is. It clear who would support tyranny.

It ain’t the teachers, the school or the universities.

 

 

References 

“Pro-Trump rioters escorted down steps of US Capitol by police,” even holding hands. https://www.youtube.com/watch?v=G9hjfQ3xgIA

Teargassing largely peaceful protesters https://www.npr.org/2020/06/01/867532070/trumps-unannounced-church-visit-angers-church-officials

More than 300 protesters arrested as Kavanaugh demonstrations pack Capitol Hill https://www.cnn.com/2018/10/04/politics/kavanaugh-protests-us-capitol/index.html 

Disability advocates arrested during health care protest at McConnell’s office. https://www.washingtonpost.com/local/public-safety/disability-advocates-arrested-during-health-care-protest-at-mcconnells-office/2017/06/22/f5dd9992-576f-11e7-ba90-f5875b7d1876_story.html

National Guard Troops on steps of  Lincoln Memorial https://www.snopes.com/fact-check/national-guard-trump-mob/

Selfies with Capitol Police. https://twitter.com/bubbaprog/status/1346920198461419520?s=20

Google search for: liberal bias in public schools https://www.google.com/search?client=safari&rls=en&q=liberal+bias+in+public+schools&ie=UTF-8&oe=UTF-8

Google search for: covid restrictions tyrany https://www.google.com/search?q=covid+restrictions+tyranny&client=safari&rls=en&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiUztWZiYruAhW2GFkFHRJECMsQ_AUoAnoECAcQBA&biw=1324&bih=792

TheFire.org

Chronically attacks on public education. https://thenewpress.com/books/wolf-schoolhouse-door

Again chronically attacks on public education. https://www.publicaffairsbooks.com/titles/derek-w-black/schoolhouse-burning/9781541788442/

Capitol Police opening the gates for the violent seditious insurrectionists. https://www.snopes.com/fact-check/capitol-police-opened-gates/

Political partisanship in the Secret Service so extreme that many agents cannot be trusted to protect the next president. https://www.bostonglobe.com/2020/12/31/nation/secret-service-is-making-some-staff-changes-presidential-detail-that-will-guard-president-elect-joe-biden/

84% of Police Offices Supported Donald Trump in 2016. https://www.policemag.com/342098/the-2016-police-presidential-poll

Violent Michigan Insurrectionists Overrun the State Capitol.https://www.theguardian.com/us-news/2020/apr/30/michigan-protests-coronavirus-lockdown-armed-capitol

2020 Video of Violent Insurrectionists Overrunning a US Capitol. https://www.youtube.com/watch?v=6_jWONaP-4U

Capitol Rioters Planned for Weeks in Plain Sight. The Police Weren’t Ready. https://www.propublica.org/article/capitol-rioters-planned-for-weeks-in-plain-sight-the-police-werent-ready

Capitol Police told member of Congress that they were ready for January 6. https://twitter.com/kyledcheney/status/1347314075710185475

Pipe bombs found in DC. https://www.cnbc.com/2021/01/06/fbi-says-it-is-investigating-suspicious-devices-in-washington.html

Reliability and Validity: Revisiting the Cliché

There’s a cliché metaphor that is is commonly used to explain the natures and relationship between reliabilty and validity. I think there there is more to be learned through this metaphor than is presented.

The Cliché

The metaphor used is a target. I have seen archery, darts and javelins. But I am not a good enough artist to show those. However, I did make a target and set of images to develop the metaphor further. (Notice the depth and texture I made? Notice how the light comes from the upper left? I did that. Intentionally. I’m so proud of myself!)

 
An Empty Target.

An Empty Target.

 

The cliché explanation points out that reliability is the consistency with which one hits the target and validity is how on target one is. So, in Figure 1, we see low reliability and low validity. That is low reliability because the hits are not consistent. Were they clustered together, they would be consistent, and consistency is really the technical statistical (and psychometric) meaning of reliability. Figure 2 also shows low validity, because the hits are not really on target, in that they are not near the bullseye. But in Figure 2, the hits are clustered, so they are reliable. That consistency is reliability.

Figure 1: Low reliability and low validity.

Figure 1: Low reliability and low validity.

Figure 2. High reliability, but low validity

Figure 2. High reliability, but low validity

Figure 3: High reliability and high validity.

Figure 3: High reliability and high validity.

Figure 3 shows the dream — high reliability and high validity. Tightly clustered and clustered exactly where we would want them to be clustered. As the Kool-Aid Man would say, “Oh, yeah!”

The Optional Addition

Sometimes, the explanation includes a Figure 4.

 
Mediium Reliiablity and Validity v2.jpg

Figure 4. Medium reliability and medium validity?

 

Figure 4 shows that there is a middle ground between Figure 1 and Figure 3. That one can have middling reliability and validity. The problem with this, when it is included it is presented as tradeoff with Figure 2. That is, the gains in validity (i.e., closer to the bullseye) are offset by the losses in reliability (i.e., less tightly clustered). But I don’t buy that. I think that the gains in validity are clearly and obviously well worth the losses in reliability. Methinks that the difficulty of suggesting that Figure 4 does not present a vastly superior outcome to Figure 2 is why is is often excluded. (This comes from an agenda that I will explore next week.)

Figure 4b. Not quite as good.

Figure 4b. Not quite as good.

 
Figure 4c. High reliability and medium validity.

Figure 4c. High reliability and medium validity.

Certainly, Figure 4 is better than Figure 4b. The latter figures shows hits that simply are not as close to the bullseye, even though they are exactly as scattered as the former figure. I understand the claim that perhaps the hits in Figure 4c show equal reliability/validity tradeoffs with Figure 4b. But both seem clearly inferior to Figure 4, to me. (Again, an agenda to explore next week.)

Extending the Metaphor

I think that we can extend the metaphor for two more lessons.

First, I think that the differences between Figure 4 (it’s the same Figure 4 shown above) and Figure 5 are the most important differences in this whole metaphor. In reality, we simply cannot expect perfect reliability. Not even Figure 2 was perfect. It reality, it is just as question of how much better we can make the reliability. In reality, small differences are what we can achieve, where we can improve, almost all of the time.

Figure 4.

Figure 4.

 
Figure 5

Figure 5

Incremental progress is progress, after all. And, in reality, almost all progress that has any chance of sticking is incremental progress. So, if you can go from kinda medium to a little bit bette than kinda medium? Take the win!

Last, there is one final lesson we can take in this metaphor. Figure 8 shows the statistical view of reliability and validity. it shows the psychometric view.

 
Figure 6. The Psychometric View

Figure 6. The Psychometric View

 

No, there is no target in the psychometric view. Because we cannot quantify validity, statistics have almost nothing to say about validity. They do not even see a target. But damn aren’t those hits tightly clustered!

Democracy and Schools

I occasionally come back to this them of the importance of politics and democracy for our schools. And I am back here, again.

I was taught by a grad school professor that politics is how we come to community (or national) decisions that are based on values, rather than on technical criteria. That different political systems give us different ways to make those decisions for communities. Perhaps I should have understand that that was the essential nature and purpose of politics before that, but that is when I did.

In this country, we use democracy as our political system. Sometimes, we even think of “democracy” and meaning whatever the American politician system is. (And I am ignoring the historically ignorant people who claim that we are republic and not a democracy, because they clearly have not read what the founders meant and wrote about each of those terms.)

So, what is the purpose of our democracy? That, why is democracy our system for political decision-making? I can think of four potential reasons.

  • Democratic accountability for our leaders. (i..e., the ability to kick out the bad ones)

  • Learning the will of the people. (i.e., doing what the people want)

  • Obtaining the consent of the governed by including them in decision-making. (i.e., decreasing resistance to governance)

  • Obtaining legitimacy for government. (i.e., simply a moral requirement)

My niece is taking a class at college this year on The Future of Our Democracy. I am not hopeful about the future of our democracy. Whatever the purpose of our democracy, I think that ti is being undermined.

These purposes require access to the ballot box, and that is being limited as it has not in decades. These purposes require that political leaders are honest about their positions, their opponents positions and the contents of the bills they support or oppose, but flat out lies about all of that seems to be at a high. These purposes require a willingness to recognize when you are in the minority and concessions to the majority that they get to win (and to rule) and we have clearly lost much of that.

I’ve been worried for a long time about this constitutional crisis that we are in that is undermining our government’s ability to govern. But the state — the future — of our democracy seems quite uncertain to me.

Which takes me to our schools. Too few people vote in school board elections. It is rarely clear what those votes really mean. Local school boards get too little attention just all all the time, but disproportionate attention for things that should not matter so much. Changing high school mascots gets more attention than budget cuts or new curricular directions.

I know that we need democratic oversight for our schools. And I often assume that we mostly have it.

But I have to wonder, do we? And what could possibly improve the situation?