Hoist on My Own Petard: Drafting without Sufficient Pre-Writing

December 21, 2020

I recently was doing a small implementation evaluation report for a long term client. I read the documentation, spoke to the people, reviewed the documentation and thought about it. I went over my notes, make a list of the problematic implications. Normal stuff.

I thought I was ready to write. It would be fairly quick. No need to be the most formal, it being a long term client and all. Explain some background, lay out the facts, point to the issues, make some recommendations. I figured that it would be about 5 pages, single spaced with a bunch of headings and bullets. It turned out to be six pages.

Here’s the thing: I didn’t really pre-write properly. Sure, I had the ideas, but there is the critical step between the research and actually writing a decent draft, which I more or less skipped.

Way back in the day, I was taught that this step is called “pre-writing,” a term I kinda hate. There is so much work that comes before this step, and calling this step “pre-writing” just ignores all of it. But that is what I was taught, and it is stuck in my head.

I tend to call it “outlining,” because I believe that that is the most useful way to do it. In this case, maybe 15 lines, taking Intro and Summary for granted. That leaves three top level headings, Background, Procedure and Issues. I had that in my head — which is where some outlining can happen. But I was being to lazy (or rushing too much) and did not figuring out what the big Background Issues were before I started drafting. The Procedures section was fine. But the Issues section was a mess. And the Recommendations missed a really useful one.

The Issues section did not have a great order. The bulleted paragraphs were not distinct enough. They were not close to equal weight. Some issues ended up split across too many bullets. Ugh.

This was my fault. I knew better. I see this mistake often. I coach people to do better, all the time! So, I had to take my own painful advice and follow it.

If you do not do you outlining or prewriting before you write a draft, it will be a bad draft. And it will be more work to try to fix it than it will be to start over again. If you use that first draft to work out exactly what you want to say, what your argument is and what points you want to make, it is not actually your first draft. That is your pre-writing. It it is valuable and important, but it is not actually a draft you should edit.

Finally realizing this, I opened up a new window and started typing anew. I could not just copy paragraphs and move them around. I needed to do a better job breaking up report. I needed to use the the early sections better to set up the later sections, without giving too much away early. The writing in later sections needed to be more self-contained. So, I needed to start a new document from scratch. I had to avoid the temptation of trying to reuse the bad stuff I had already written.

Hoist on my own petard. Making the most common mistake I have to coach my dissertation coaching clients though.

But I ended up with a report that I actually am proud of.

What Is an Ed.D.?

December 14, 2020

While the issue of the meaning of an Ed.D. is in the news right now, this is actually something I have had to explain many times.

In short, and Ed.D. — like a Ph.D. — is awarded by a graduate school of educator for a set of coursework and a relatively large formal research project written up in the most formal way. Everything else about it the Ed.D., everything that you have heard or read, varies by institution.

Are Ed.D.s merely practitioners’ degrees?

Nope. For example, until recently, the Harvard Graduate School of Education only awarded Ed.D.s, and did not award Ph.D.s. Harvard is very focused on research, sending very few doctorates back to schools and districts as administers or teachers — though it does send many many masters students back to schools, distills and other non-research institutions. Similarly, until recently, Teachers College, Columbia University — the nation’s largest graduate school of education — only awarded Ed.D.s and did not award Ph.D.s.

When I was at Teachers College, I observed this up close. If students there wanted to earn a Ph.D., they needed to officially get their degree from GSAS (i.e., the Graduate School of Arts and Sciences) of Columbia University, a different institution within Columbia University. Columbia has all the schools (i.e., law school, medical school, business school,GSAS, etc..). Teachers College students would need to find a professor in a GSAS department (e.g., political science, psychology, economics) to serve on their committee. They would also have to fulfill various graduation requirements of that department, in addition to the Teachers College requirements.

Teachers College offers economics courses, focused on the context of education. It offers political science courses, focused on the context of education. It offers psychology courses, focused on the context of education. It is a very large institution that offers a wide variety of disciplinary courses, with the courses always focused on the context of education. After all, it is Columbia University’s graduate school of education.

There was rarely a need to try to find a GSAS professor and take GSAS coursework — which was not focused on our context. Regardless of career goals, the Ed.D. was a perfectly acceptable degree.

Do Ed.D.s require less work than Ph.D.s?

At Teachers College, the Ph.D. required just 75 credits, plus an appropriate doctoral dissertation, butt the Ed.D. required 90 credits, plus an appropriate doctoral dissertation. This 75 could include the requirements of the GSAS department, or perhaps the student might need a few more credits.

So, the Ph.D. does not require more coursework.

The language of old policies of the university requires Ph.D. students to master a foreign language. However, sufficient research methods courses work counted as a foreign language.

Are Ed.D.s faster than Ph.D.s?

Many institutions that offer MBA (i.e., matters of business of administration) degrees also offer “executive MBA” programs, designed to better fit the schedules of working professionals. They focus on intensive weekend courses, and the occasional week-long full-time experience. This take the place of more traditional courses taken during the day.

Some of the programs appear — to my eyes — to demand less of this busy professionals. Less reading. Less writing. Less thinking time. But some of them appear just as demanding, and perhaps even more so.

Does this make them faster or weaker? It depends on the program, not on the degree. They all get MBA degrees.

Similarly, there are some practitioner-focused “executive” Ed.D. programs. But that is not intrinsic to the Ed.D. and their rigor varies by program.

There are also a wide variety of Ed.D. programs designed to be very fast — as fast as three or four years. Others counsel that a degree take a minimum of four years, but that students should expect more like five to seven years, if not longer. I do not think particularly well of the very fast programs, but I understand why they exist and why they are popular. But again, this is a difference between programs that has little to do with the Ed.D. degree itself.

What about J.D.s, M.D.s and Other.D.s?

J.D.s (i.e., law degrees) and M.D.s (medical degrees) are not research-based degrees. That is, they do not have a requirement of doctoral dissertation. Recipients of the professional degrees do not have to do a big research project in order to graduate.

Many aspirating medical researcher do MD/PhD programs. That’s where the research portion of the graduate work comes in, both in terms of training and in terms of experience.

Other graduate schools also offer doctoral degrees. Among research degrees are theology degrees, design degrees, library science degrees and many many more. The one that threw me for the biggest loop, when I heard about it, was nursing. D.N.Sc. Doctor of Nursing Science. There are also many many other professional degrees. Dentistry, social work, athletic training and many (mostly somehow connected to medicine) more.

Who gets to be called “Dr.”?

We know that that title should not be limited to those who have delivered babies, because it seems nonsensical to deny brain surgeons and heart surgeons the title, simply because of of the part of the body they work on. That really seems to miss the point. (Also, plenty of EMTs have delivered babies, and they are not called “doctor.”)

The question is whether the title should be preserved for medical doctors. I know that I have said many times, “Not that kind of doctor,” – almost always to be funny. In many contexts, the assumption is the the doctor in question is a medical doctor. However, the M.D. is simply a less demanding degree than a research doctorate. Perhaps the coursework is harder — depending on the program – but the M.D. lacks research and dissertation writing components of the array of research doctorates.

While there are reasonable questions to be asked about recipients of honorary doctorates using the title “Dr.”, it strikes me as simply asinine to suggest that those who have earned research doctorates should not be called “Doctor.”

Sure, I can see why some institutions might be questioned, but he degree itself? No, there is no reason to view Ed.D.s on their face as less demanding or rigorous than Ph.D.s., and certainly no reason to call them out specifically.

If you really want to be a condescending ass, question the quality of the individual dissection by actually reading it. Otherwise, is seems….well, just stupid….to challenge the well-established norm of calling holders of research based doctorates “Doctor.”

Justifying Arrogance with Humility

November 17, 2020

Humility is a core of Rigorous Test Development (RTD). This likely appears odd to a lot of people, considering how arrogant we are about RTD, the confidence with which I (in particular) often speak, and the even greater and more frequent confidence that people here in my voice. My collaborator often speaks with confidence as well — though as a woman, she faces greater potential penalty for doing so.

And yet, we believe very strongly in the value off humility in the work.

In fact, we believe that our humility in the work is not at odds with our our confidence — or even my arrogance. Rather, our humility helps to support our confidence.

First, we preach that professionals should be very mindful of the limits of their expertise. They should know when they speaking outside of their expertise. Even when they could justify some claims too expertise to people who might not know better, if they themselves know that the topic is outside their expertise, they should speak much more cautiously. For example, we know a lot about assessment, but we know quite a bit less about large scale performance assessment. We know a lot about scoring, but we know far less about large scale administration of tests.

Second, we think that it is important to respect the expertise of others. Certainly, if we want them to respect ours, we have to walk that walk ourselves. Respect in collaborative work simply cannot be a one-way street. Furthermore, one should affirmatively look for the expertise that others bring to the table, rather than assuming that they lack any at all. This does not mean deferring to everything they claim or in every area in which they seem confident. This is more critical than that. Evaluate their claims — implied or explicit — of expertise, and think about their bases.

Third — and perhaps this should be first — admit when you do not know something. Some people think that admitting ignorance is a sign of stupidity and incompetence. We believe that the truly confidence can — and should — admit this kind of limit. I know that I am capable, and I am confident enough that I do not need to pretend that I know more than I do. My ego is not so fragile that I cannot admit my current limits. And by admitting that limit (both too myself and to you), I give myself an opportunity to rectify it. Perhaps you can teach me, perhaps we can learn together, or may I can investigate on my own and report back to you.

Fourth, if it is important you to be right — as it is to me and my most cherished loves ones, to an annoying degree — affirmatively look out for when you are wrong. I do not mean that you should easily admit fault when you do not believe it. Rather, I mean try to limit the that fault to one occurrence. As a value, I would rather be wrong and corrected once, than be wrong many many times. Of course, it is (too?) hard to convince me that I am wrong, but I take such pleasure in learning that it is actually a little moment of joy when I realize it. The learning is great, and knowing that I to be right in the future matters a lot to me.

Fifth, try to know the limits of your own perspective and experiences. Obviously, this is key to most of the issues mentioned above. Broader experience enables each of us to better understand our earlier experiences. For example, our work with a second team (e.g., in a different organization) and make visible things that we took for granted in the first context. Different experiences — perhaps with different constraints and priorities — can give us a better understanding of the tradeoff and value of different kinds of decisions.

Sixth, there is a lot to be gained by seeking out and listening to the experiences of others. We can learn vicariously from their experience. In our own work, we combine inside-outsider perspectives, asking each other endless questions. We each have grown our understanding by learning from what the other has been through.

In these ways and others, we try to temper our confidence or arrogance. No, it does not make us less confidence, but instead of makes our confidence less fragile and — in our view — more justified. Our careful efforts at humility are not insincere, nor do they undermine us. Rather, the put us in a position to do better work.

Lessons on Representativeness from the 2020 Election

November 13, 2020

If there is one issue that I think most needs to be understood about conducting— or consuming — research, it is understanding the representativeness of your data. The 2020 presidential election illustrates this in a number of ways.

First, the pre-election polls were off. This was also true in 2016, when the polls were off by about 3 points. It appears that this year’s polls were off by a bit more than that, but not off by huge amount. However, this error was not evenly distributed about the country. For example, the polls in Minnesota were not off by very much, but the polls in Wisconsin were off by quite a bit. (note: I am not citing specific numbers because the the election results have yet to be certified in any state.)

The major explanations going around for this concern the composition of sample. Some people cite differences between the sample and the overall population in educational attainment, while others point to trickier construct, social trust. This latter groups that weighting samples by education simply does not do enough to correct for the underrepresentation of low-trust individuals in the sample. That is, educational attainment, they suggest, is a poor proxy for social trust.

Regardless of the details, everyone agrees that the group poll respondents were different that the voting public, along a number of factors. And while high quality pollsters try to correct for this, they didn’t capture something, this time.

It is easier to correct for educational attainment, because it is easy to ask about. Educational attainment is a very easily observed variable. However, social trust is much harder to see, harder to ask about, and therefore harder to pin down. It is an unobserved variable. We can correct or control for differences in observed variables, but not unobserved variables.

One reason why large sample sizes are useful is that we hope that with a large enough group, the unobserved stuff kinda cancels out. However, in this case, that low-trust group might simple be far less likely to respond to pollsters. This undermines their sampling strategy, both in data collection and in weighting the subgroups. Obviously, they need to do a better job of turning the unobserved variable into an observed variable. Somehow.

The lesson: Unobserved variables can entirely undermine the generalizability of your results when your sample is not representative of the relevant population.

Second, it appears that the samples in the pre-election polls were different than they had been in the past. Pollster reported a greater response rate than in recent years, and the composition of those respondents might have been quite different. For example, if were white collar and office workers were newly able to work from home, and thereby have more time and/or freedom to answer than calls and take part in polls, that might favor one party over another. If lowered education service workers were not impacted by COVID and various efforts to reduce its spread, old sampling methodologies could yield significantly different samples than in the past — even the recent past.

I have not seen any mention or discussion of pollsters observing this and treating as a problem to be solved, as opposed to an opportunity to be taken advantage of. Yes, it is great when the response rate goes up, but its might not be all good.

The lesson: Adopting old methodologies and assumptions in new contexts is always tricky. Always be diligent and careful to correct for seemingly positive changes, as they may carry with them less obvious harmful changes.

Third, early returns were rather misleading in many states. This was because the voters who were counted earlier were different — as a group — than those counted later. In some states, early and mail-in votes were counted earlier, and in some states those were counted later. Everyone expected that those early and mail-in voters were more Democratic than the traditional in-person voters, though no one could be certain how different.

In this case, there were two different populations, and no good way to anticipate their differences. Successful efforts by both major parties to increase turnout made comparing the groups even harder, as no one could predict how successful each of those efforts might be.

Only fools (e.g., me, my wife, my best friend) were paying any attention to the vote counts released early in the evening, on election day. Heck, it was still pretty foolish to pay attention later in the evening. We already knew how most states would turn out, and these returns simply could not predict the final result in the close states. Whichever group was counted first simply could not be generalized to the second group. At the very least, we all had to wait for a sufficient portion of each group to be counted to predict anything.

The lesson: We cannot expect to be able to generalize results from one population to some other population that is already to known to be different in significant and relevant ways. This is especially true when the extent of those different are not known.

Fourth, the final results of many states were predictable long before the actual results got there. For example, experts know that Michigan, Pennsylvania and Georgia would end up going for Joe Biden, even when Donald Trump was far ahead in the count.

How was this possible?

History provided clear insight into the proportion of votes that from some areas of those states, relative to others.
Counties and states reported estimates of how many votes they had outstanding.
History provided a baseline for expectations about the split of the vote in different areas.
Early counting of the in-person votes and the early/mail-in votes provided further information that helped refine those historically-based expectations.

Thus, it did not require magic to predict the future. Instead, one could simply use some basic algebra to predict the count (of what had already happened). Similarly, in short order it was not that difficult to see that counts in other states — though close — would not lead to a different result.

Now, none of this was clear early in the evening. But as more results — even partial — came in from across each state and from every group of voters, the eventual results were easy to see. Those with access too the most detailed historical and emerging data could see the trends and where they would end up.

The lesson: Given sufficient data, diligence and patience, one can extrapolate from sampled data to the larger population, so long as the samples are well matched to the larger populations.

None of this is new, but gaps between predictions, partial results and full results of the 2020 presidential election have received enormous amount of discussion and attention. This can be good opportunity to think more carefully about how representativeness and efforts to get representativeness play out, in practice.

Self-Selection, Silence and Representativeness

October 27, 2020

Unless they are doing program evaluation, researchers want to generalize from their data to some wider population or time frame. Education assessments similarly want to generalize beyond the handful of questions that appear on the test. But it is not just researchers and assessment developers who want to do this. Everyone wants to look at those results and infer that they can generalize to the larger population.

A larger sample size can make such generalizations more defensible, but large sample size is not enough. Heck, sample size isn’t even the point of sample size.

The point is representativeness. If you had a perfectly representative along the characteristics of interest, the sample size simply would not matter. Not at all.

Now, we know that we cannot get those perfectly representative samples because the whole point of our research/assessment is to uncover things about the population that we do not already know. So, sample size becomes important because we use random sampling approximate a representative sample — a trick that we have statistics to better describe and understand, but that only works with larger sample sizes.

Self-selected samples, however, undermine this whole effort. We already know that those who volunteer to participate in the study or survey or poll are different from those who do not. We just do not know how different or in how many ways they are different. People who respond to an emailed invitation to participate in research are different from those who do not. People who answer phone calls from pollsters are different from this who do not.

The massive New York City public school system asked parents to declare whether or not they wanted they children too return to in-school classes, hybrid class, or entirely remote learning this fall. It found that most students’ families would send their children back to school buildings.

Actually, that is not true. That is what NYC claimed and what was often reported. But it was not what families said and it was not what happened in the schools. In fact, roughly 1/4 of families said that they would not send their children back to school buildings and the other 3/4 of families said nothing. They did not opt for returning to school building. They did not opt for anything. They simply did not answer the survey.

The mayor and other assumed that their silence meant something they they understood. They assumed that the non-respondents were making as intentional decision as the the respondents were. They assumed that the non-respondents were like the respondents.

This was a self-selected respondent pool. There were no efforts at representativeness. There were no efforts to understand the non-respondents. It was the laziest possible way to collect data.

Work to ensure a representative sample is hard. It is expensive. It is work. And it is not fun. This work does not help you to get the answer you want. It is not about the substantive ideas, thinking or theories of whatever you are interested in. It is annoying and repetitive work to follow up with invitees, slowing down data collection, making it more expensive and giving reasons to doubt your results before you have even compiled them.

But response/non-response bias is huge. High integrity researchers take it quite seriously.

Why Democracy is Necessary for Policy-Making

October 13, 2020

Politics is how groups with differing or competing ideas, values and/or agendas reconcile those differences and direct group action. This can happen in small groups at work or on the national state. Perhaps in the worst case, the political answer is massive repression of the will of some minority groups, or even of a majority. More ideally, though political processes we come to some compromise, or at least an agreed upon mechanism to choose between differing preferences.

While there are technical questions in policy and eduction policy, there are an awful lot of values questions. For example, allocation of scare resources is almost always a values question. The appropriate balance between desires for excellence (i.e., fostering even greater accomplishments of those we expect to be the most accomplished) and desires for equity is clearly a values question, and one that has been central to my own education and always present in my mind as I work.

This year, we are seeing some of the top selective-admissions public schools in the country rethinking their admissions processes. The nation’s oldest public school (in Boston, MA), and my own high school (in Fairfax County, VA) have announced they are are shifting away from their old exam-based models. These announcements come after decades of complaints that those old processes yielded student bodies that were wildly non-representative of the large student populations in those school systems.

Those who have pressed for change have known that they are arguing from values, yet somehow those who have defended the old status quo have claimed that they were not. They claims some technical reasons why these schools had to admit only the top students, and that they had some appropriate means to identify those students. But those tests were build on values about what should be included and how to balance reliability and validity in assessment. And insisting that that admission should be limited to those who are best at whatever folks think the test measure is also a value.

Senator Mike Lee recently tweeted, “Democracy isn’t the objective; liberty, peace, and prospefity [sic] are. We want the human condition to flourish. Rank democracy can thwart that.” The problem with his thinking is that even if we accept that those are the proper objectives, how we understand or define liberty and how we distribute prosperity are questions of values. And we need some political mechanism to come to answers.

In this country, as in most of the world, we accept that we should have democratic systems to address the issues in the political arena. That is, we should use democratic mechanisms to translate a diverse array of views into direction for our governments and governmental policy — including public schooling. We have democratic oversight of our schools, either through elected school boards or through school board overseen by elected officials. Our various departments of education are given direction and overseen by our elected officials.

These exam schools get undue attention, for various reasons. I pay them too much attention because I attended one of them. At times, their admissions criteria can be one of the most widely covered issue in public education, and we appear to be in one of those times.

So, I feel the need to say again, nearly everything about admissions criteria and processes at these special schools is based on values. Everyone offering a thought or opinion about what their criteria should recognize that they are speaking from their values, and consider deeply why their values should be taken more seriously than those of others — particularly if they are counseling against the popular will of the communities which they schools are to serve.

Your Data Does Not Prove Anything

October 5, 2020

On this one hand, this post is about humility, but on the other hand this post simply about intellectual integrity. In this case — as with so many others — they go hand in hand.

We all want our studies and our evidence to high quality. We want our data and our results to prove our idea, or at least to disprove someone else’s. We want the satisfaction of a completed argument.

Unfortunately, we do not get that. I mean…never? Yeah, I am willing to say never.

Now, I am no nihilist. It is not that I think that evidence is meaningless or valueless. But rather, it is the accumulation of evidence that lends weight and should have us confidently accepting that something is true, or something is false. No single study can do that.

Thought it is not my main point, let me address those who claim experimental design accomplishes this. It does not, because it not sufficient. Not only do you need random assignment (i.e., the only requirement of experiential design), but you need it to be blind (i.e., participants do not know which group they are in). Not not just blind, but double blind (i.e., those administering the treatment also do not know which group is which). Not just double blind, but sufficient sample size for the unobserved potentially relevant characteristics to cancel out (i.e., a REALLY big study). Not just a big sample, but actually a representative sample from the population. Does your experiment have that? I’ll bet that it does not, if for no other reason than those administering the treatment know what they are doing. Sure, placebo pills can be made that look like treatment pills. But educators and therapists know what they are doing. People implementing policy — as opposed to passing out medications — know what is happening. So, experimental design is not sufficient to prove something true of false.

However, my main point is not about experimental design — though it should inform your understanding of its limits, in most cases.

My main point is that even if your data is accurately observed, recorded and understood, you still have not proven causality. The big issue that is so hard to rule out is alternative explanations.

The purpose of experimental design is to rule out alternative explanations. And if you have random assignment, its double blind, and you have a large sample that is representative of the population, ok. Sure. But violate any of those, and you cannot be sure that the effect you observe is due to the cause your are investigating. You just cannot.

This is often true in the world of assessment, as well. Evidence Centered Design (ECD) focuses on collecting evidence that is consistent with the claiming that one might want to make. What would it look like of the test takers could do X. However, even ECD — of which I am a big fan — has no procedure or call to ensure that the evidence could not be supplied for another reason. Even ECD has a blind spot for the ambiguity of evidence.

So, there are two lessons to keep in mind, moving forward. First, do what you can in your research or assessment design to minimize ambiguity of evidence. Second, remember that you cannot eliminate it, and therefore should be humble when making claims about what it shows — without ever using the word “proves.”

More on Literature Reviews

September 28, 2020

There are three more point that I want to make about literature reviews.

Mind the Gap

Most research attempts to build on an extend the research base in the literature. As such, they projects — including virtually all doctoral research projects — must identify a gap in the literature.

What constitutes to a gap? Well, it is where we have not gone before, have not looked before. It could be something entirely new, but usually is not. Instead, it tends to be a new version, a new context, a new kind of data or examination.

It is perfectly fine to look at existing research and wonderful if that approach would work in a different context. For example, would this work in a more urban school? Would that return the same results if it was in a lower property school? Can we get better results with a different kind of staffing? All of those are novel enough that they constitute a gap.

Or rather, they would constitute a gap so long as that project has not been done before by someone else.

One purpose of a literature review — not just the write up, but the actual work of reviewing the literature — is to find out what was has already been done. Redoing someone else’s almost identical project is not verboten (see below), but it should be be disclosed. And you should know that that is what you are doing.

You see, one thing that makes academic and scholarly research different than action research is that the goal is to help inform the field and take part in the scholarly conversation, rather than simply to inform yourself. If you are just looking for your own answer, you can do action research. Well, you can look to the literature and see yourself a lot of trouble, or you can try the action research yourself. But if you are doing scholarly research, you need to know what has gone on in the conversation and whether someone else has already said what you are aiming to say.

So, the gap is your niche. What are you looking into that no one else has quite looked into like you before. What is your angle? Are you using a different critical lens your analysis? Are you collecting a different kind of data? Again, these are enough to constitute a gap.

Replication Projects

Not all research has to be based on a gap in the literature. In fact, too much research is based on identifying a gap. We should have far more replication in virtually every field.

Just because I reported a particular results from my study, that does not mean that the case is closed. Yes, you could try to take a different angle on it (see above), but you could also simply try my exact project again. Hopefully, my results are robust enough to take it, but how can we be sure?

The best way to be sure of research results is to try again, and not just by changing things up a little bit. Yes, over time we would want to subject my findings to all kinds of checks by modifying this and that about it. But we also want to make sure that what I found is robust enough for others to find it.

We have a replication crisis in virtually every field of research. This is particularly well documented in psychology, but I believe that it is true for all fields. Certainly, education research — where we have many many studies that are not widely read and insufficient literature reviews to consider what has come before, jin part because of the volume of research out there – we have too little idea about what is robust and what is not.

Unfortunately, most doctoral programs require more original research projects than replication for qualifying dissertations. However, if your doctoral program might allow something very much like a replication, I think that it can be a very good idea.

Understand that replication projects do still require work for the literature review. You cannot simply replicated the original project’s lit review. Rather, you should do an even more complete review of the prior literature and review what has come since and been based on the original study. You should take an even more critical eye on the basis of the original study. And, of course, you should subject the original study to the strictest of scrutiny. That is no less work than a more typical literature review.

EdD quick doc programs

I always worry about Ed.D. programs that are designed to speed their students through the process. Often, they have them collecting data before they’ve done their literature reviews. This is simply not scholarly research.

Your research should be based upon some intersection between the scholarly conversation and your own interests. The scholarly conversation should inform the kinds of questions you ask and how think about the possibilities of addressing them. If you collect data before you have gotten familiar with what has come before, them you are only doing research based on your interests. That is action research, not scholarly research.

Now, if the field decides that action research is sufficient for an Ed.D., I would not argue against that. I could appreciate a line between Ed.D.s and Ph.D.s that was based on the distinction between action research and scholarly research. But that is not what the field says. Instead, there is some vague idea in some institutes that one is better than the other, while other institutions only offer one of them. For example, my doctoral advisor has an Ed.D. from Harvard’s Graduate School of Education, in part because that is the only doctoral degree they offered while she was there.

If you are considering and Ed.D. program that is designed to get you through quickly (i.e., in just 2-3 years), I strongly suggest diving into the literature on your topic as soon as possible, and certainly sooner than they would have you do so. Having a sense of the literature will help you to refine your topic and your research questions far better and make your project more satisfying along the way.

Reusing Rubrics Well

September 22, 2020

While rubrics are quite valuable when they are custom tailored to individual assignments, projects and items, they can be equally — though differently — valuable when they are reused over time, across assignment and even across grades (i.e., multiple simultaneous use). That clarifying value of rubrics (i.e., making criteria and performance levels more transparent) can guide both teachers and self-aware learners to focus their efforts on the most important lessons or KSAs (knowledge, skills, abilities), over time. Growth in performance and learning can have greater and clearer meaning when the criteria are clear and consistent.

However, there are potential dangers in multiple use and reuse of rubrics. The more they are used, the greater those dangers, and the more important it becomes to anticipate them and prevent them.

When rubrics are reused over time or simultaneously across grades, they may be used for different assignments and that product different work products — to varying degrees. Thus, the traits listed in the rubric might apply to some assignments better than to others.
When rubrics are reused over time or simultaneously across grades, they may be used for different assignments and that produce different work products — to varying degrees. Thus, the performance descriptions of the traits listed in the rubric might apply to some assignments better than to others.
When rubrics are reused over time or simultaneously across grades, students may grow beyond performance levels described in the rubric (i.e., a ceiling effect). While this can help to show proficiency or mastery, it comes at the cost of providing direction to teachers or students about where they might best focus their efforts, next.
Alternatively, avoiding the ceiling effect might come at the expense of the earlier uses of the rubric (and/or use at lower grades). That is, students may fall short of any of the performance descriptors in the rubrics (i.e., floor effect), thus failing to earn any credit for the work they have done.

These issues are not limited to classroom use of rubrics. They also can occur when rubrics are used to score standardized assessments. While the benefits or rubrics are a bit different in these contexts, they are no less valuable, overall. In fact, rubrics are quite necessary in scoring standardized assessments. Clearly, communications to educators about the rubrics used to score student/test taker work product can itself be quite valuable, helping to align instruction and assessment. However, rubrics are necessary to score constructed response items because they are the mechanism for consistent scoring across the entire test population.

Of course, the obvious way to address these issues is to anticipate them and simply write better rubrics in the first place. Duh! Those among us who can see the future clearly have an enormous advantage in doing this well. But even that special group sometimes must use a rubric that was handed to them, fait accompli. Furthermore, making rubrics generalized enough to handle a broader range of assignments and work product can lead to vague rubrics that are difficult to use. Rubrics that can be used across longer spans of time — even years — can have unfortunate or inconsistent effects on how those scores are weighted relative to other scores, due to ceiling and floor effects. This can particularly be problem when rubrics are used simultaneously across grades.

One might simply be careful to construct new new assignments that match those previously scored with the rubric, but this limits flexibility to the point that it can even stymie the kind of continuous improvement the we all want our teachers, schools and tests to engage in — even when it is possible. Often, we want (or need) the benefits of a consistent rubric, even as the nature of the students and their work are developing over time. When rubrics are used simultaneously across grades, it is often simply necessary to for tasks to be different at different grade levels. These sorts of flexibility in rubric use require the rubrics to be a bit vaguer or more generic, so that they might apply to these broader uses. But that lesser specificity makes it harder to use the rubric well or consistently.

This leads to the necessity of supplementing rubrics with additional materials, so that they can be used consistently. This is particularly important when multiple teachers or scorers are using the same same rubrics.

Exemplars. The most common supplementary material is a group of exemplars. That is, authentic or simulated student/test taker work products that demonstrate what different performance levels look like for this particular task. Teachers and other scorers gain better — and more consistent — understanding of the rubric in this context by examining it in light of those exemplars.
Interpretative Guidance. More narrative or explanatory support can help teachers and other scorers to understand the thinking and/or appropriate shifts in thinking required to apply the rubric to different level of students learning (and/or development) or a new context. This is especially important when rubrics are reused across longer periods of student learning (e.g., years) or a new context or simultaneously across grades.

In the absence of supplementary materials, teachers and other scorers are often left to their own devices, and can respond inconsistently. Some may simply accept extraordinarily low or high scorers across the board. Some may simply ignore the traits/dimensions that seem less relevant. And some may substitute their own judgment (e.g., new traits) for what strikes them as an inappropriate rubrics — entirely undermining the rubric, itself. While some of these outcomes might seem acceptable, the fact that — without supplementary materials — they will happen unpredictably violates the fundamental purposes of rubric use.

In classroom use, this can demoralize students and/or leave them without direction. It can do the same for their teachers, who so often are invested in their students’ engagement in their classrooms. In the context standardized assessment, it can prevent the kind of scoring consistency that is necessary for standardized test scores to have any reliable meaning.

Why I Love Rubrics

September 14, 2020

My oldest concern about education policy and practice is the meaning of grades. I started to wonder about standardized tests in middle school, but my questions about grading practices go back even further than that. Assessment is my longest running obsession.

I have had teachers that would lower a grade because students’ names were not in the correct corner of the page, or the order of name, period and date were wrong. We know that handwriting quality can impact grades. And, as a new teacher, I learned that many of my colleagues kept grade books, but would simply eyeball them to assign a grade for the term, rather than do the simple work of actually averaging the recorded grades. (This was back in he 1990’s, when grade books were all on paper.)

And I never understood why 50% had to be a failing grade, or how anyone could do such good work on a project or paper that it was absolutely perfect (i.e., 100%).

The arbitrary and inconsistent methods used to calculate grades — either for an individual assignment or for a marking period — baffled me, and at times infuriated me. To this day, I do not know what a B+ means. Is it mastery of the content but poor organizational skills? Is it mediocre performance on the content, bu hard work, diligence, sweetness and all that extra credit?

Today, I can defend grades better than I could then, but I still have lots of problems with them.

But rubrics address most of my concerns, at least on the assignment level.

Rubrics lay out what the is relevant to the grade, in their dimensions/traits.
Rubrics lay out what each level of performance should look like
Rubrics can give advance notice to test takers of the criteria.
Rubrics help teachers to be more consistent across students.
Rubrics lay waste to the practice of 50% meaning a failing grade.
Rubrics can give students clear direction on where they need to improve their performance.
Rubrics can help teachers to ignore things that they should ignore.
Writing rubrics is a good exercise to help teachers think about the learning goals they have for students.

This is not to say that rubrics are perfect. Poor rubrics create enormous problems. Teachers who ignore their own rubrics when grading (or who use them improperly) undermine the whole idea of rubrics — and the trust that students should have in them.

But rubrics can be flexible. Rubrics can be tailored for individual assignments, or set up as a grading system to be used over time. Explaining rubrics to students can provide scaffolding for students to self-monitor their own progress and to think about where they want to focus their work and attention.

Moreover, rubrics are invaluable for standardized assessment. When standardized tests use constructed response items (e.g., essays, short answer, fill in the blank, show your work), scorers need guidance and structure to ensure they they are consistent through the day and are consistent with each other as they score responses. Even automated scoring of this kind of item is based on training sampling generated by human scorers using rubrics.

In fact, we cannot use constructed response items on standardized tests without rubrics. If we did, the scoring could not be consistent, and that would violate the very definition of standardized. I have no doubt that improving the quality of our standardized tests to a truly acceptable level requires more constructed response items, which means that I want a lot more rubrics.

Understanding the Doctoral Literature Review

September 8, 2020

I help a lot of doctoral students to figure out how to write their dissertations. (That’s dissertation coaching.) I have found over and over again that they do not understand the purpose of the the literature review.

Academic research can be quite different than other research that many people do. In much of our lives, we are looking for support for our position. Trying to find the evidence and ideas that will help convince others to our position. But that is not what academic research is about, at all. (Or at least, it never should be.) At other times, we are trying to learn, for ourselves. What is out there? What is known. Or, what can I learn from doing this? Again, that is not academic research.

Instead, academic research is about building knowledge. Not one’s own personal knowledge, but rather the knowledge that we — as a field or discipline — have. Academic research is about contributing to that knowledge of the field. It is about real discovery of something new, or deeper or more specific examination of something we don’t quite have nailed down, yet.

This means that the researcher — in this case, the doctoral student — must know what has already been done. That is, they must know investigate what has come before to make sure that they are unintentionally reinventing the wheel. The fact that this researcher does not know what others have done is simply not not an excuse. Instead, it is their responsibility to do the library research and investigations to find out what has come before.

I call this dynamic of building on and contributing to the literature scholarship. That is, there is a scholarly conversation going on through journal articles and academic conferences. When young or new researchers — like doctoral students — conduct and write up their research, they should take part in this conversation. The first way that works is to acknowledge what has already been said by others, so that the researcher can build upon it and respond to it. That acknowledgement is the literature review.

This allows researchers to stand on the shoulders of the giants that have come before them. This goes to the old aphorism, if I have seen further than other men, it's because I have stood on the shoulders of giants. Heck, Google Scholar — an invaluable tool for anyone conducting or writing a literature review — adopted he motto, Stand on the Shoulders of Giants!

So, the literature reviews tells the reader, this is what has come before, this is what we — as a field — already know, and this is what I am building on. This is the scholarly context for my research.

The doctoral dissertation’s literature has an additional dimension, as well. The doctoral dissertation is a masterpiece, in the oldest sense of the word. That is, it is work done by a aspiration to be reviewed by the guild to determine whether they have have the skills to be deemed a master, to be accepted as a fellow master. Therefore, the doctoral dissertation must do all the times, and do them better and more completely that typical work. Being subject to that kind of examination, the doctoral dissertation must make all the skills and knowledge clear.

This means that the doctoral dissertation’s literature review must be more complete and in-depth than the more typical literature reviews one might see in journal articles. Doctoral candidates must show make clear to their committees that know how to take part in the scholarly conversation, that they have the skills to find the scholarly conversation and understand it. They do this by showing off how well they have done and written about this one.

That’s a lot. In my view, it is the most intimidating part of the doctoral process. It is ok to be intimidated by it. But there are way to make progress. Your program — or dissection coach — should be able to get your started, help you to strategize and help you to reorient. Then, they should help you to figure out how to write it all up.

I will write more about some smaller issues with literature reviews next week, but there’s one more thing to say here: what the literature view is not.

Doctoral dissertations are not a student’s personal or professional philosophy. They are not a life plan. They are not everything that the student wants to say. They are a very completely and clearly executed piece of research. There are places in the dissertation for the students to opine, philosophize and even rant, but those are quite specific, and none of them are in Chapter II (i.e., the literature review). Chapter II is about what others have researched, written and reported.

Literature reviews do have room for the student/author, but it is really mostly implied. The organization of the review is full of implied judgments. What is mentioned first? How in depth is the explanation of this study or that article? What is is reported about this project or that book?

Furthermore, no studies or articles should be excluded simply because the student/author does not like how they came out. The literature review should not designed simply to support the author’s desired conclusions. They should not cherrypick the literature that they like. Not only is that unethical, it is actually counterproductive. If this project works out as the author expects, it can be more powerful for offering evidence against prevailing ideas in the literature. If there are division in the literature, that only serves to justify the need for this study.

And so, doctoral dissertation literature reviews should play it straight, without being slanted to support desired conclusions. They are not about convincing anyone of a results, but rather just to inform the reader about what is known and that the doctoral candidate has truly investigated what the state of the scholarly conversation on the topic

RTD Theory of the Item (Draft)

August 31, 2020

As RTD is an item-focused approach to test development, we had to sure that we knew what items are and how they function on tests with test takers. We needed to develop RTD because we could not find anything that did this for contend development professionals.

Psychometrics does not examine items or look inside of them. Rather, psychometrics treat each item as a black box that a produces some small about of data about each test taker. It takes all that data and analyzes in a number of sophisticated ways to examine the relationships between items and what the patterns in the data say indicate about each test taker. Because psychometrics does not offer tools to examine the contents of item — which is where the cognitive content, construct and KSAs (knowledge, skills and abilities) are found — its view of items has almost nothing offer about test validity. That is, psychometrics has almost nothing to offer about whether tests actually assess what they are purported to assess — neither on the item level nor the test level.

Obviously, for those who care about validity and item-content domain alignment, that is a huge problem.

The public has a very different view of items and the psychometrics offers. It usually takes for granted that tests and items measure what they are purported to measure. It may simultaneously accept that their is some unfairness in standardized tests, but it is generally thought to be fairly minor. The public’s biggest objection to to tests appears (to us) to be that some people are just bad test takers — not that tests or items themselves are flawed. It has stronger objections to big standardized tests than to other tests, but origin and basis for those objections is unclear. At times, we see claims that these tests are racist or classist, but usually without explanation of how that is. That is, there are objections about test scores without explanations of how racism or classism infected them.

Again, a view that does not offer anything useful to improving tests, validity or items.

We had no doubt that items matter, and that their are differences in item quality. That there are good items and bad items. That some items do their jobs better than others. We figured that their job was to examine test takers for particular KSAs (or targeted cognition), and had seen too many items that we believed miss their marks. But we lacked a theory of framework for explaining that. We certainly lacked a framework that could be used to support item development and that could connect the various ideas, principles and practices that contribute to item development.

This led us to think carefully, for years, about what items are. We knew we needed to explain the relationship between an item’s goals and ideal functioning and what actually happens when they go wrong. We knew that we needed to explain how item developers connect targeted cognition and test takers. We knew that we needed to explain how test takers respond to items.

Eventually, we developed the RTD Theory of the Item (TotI). The figure below offers an illustration, but the explanation in our our book (in progress), Rigorous Test Development: A Practical and Conceptual Guide for Content Development Professionals is where you will find the really good stuff.

You can download and read a preview of our Theory of the Item chapter – still a draft, but one we feel actually explains what items really are.

Rigorous Test Development (RTD) Theory of the Item

RTD: The Book

August 24, 2020

We are finally working on the book, Rigorous Test Development: A Practical and Conceptual Guide for Content Development Professionals.

This book is intended for the people who develop standardized tests, on the item level. There is a startling lack of training, certification or professionalized/disciplinary knowledge available for them. There are no virtually no textbooks, journals, courses, degree programs or professional development programs. There is nothing that explains how they fit into the larger test development process, that gives them a conceptual view of their work or tries to treat like intelligent and decimated professionals who want to learn and grow.

This standards in contrast to psychometrics, who can get masters degrees and doctorates, who have countless conferences, journals and online courses available to them.

As in so many areas that we care about, are worried about that quality of the work that we so care about, and want very much to help it to improve. We know how the sausage is made, and we want better processes for making it so that better sausage is available to everyone.

Yeah, test items are the sausage, in this metaphor. But you knew that, right?

We know that low quality items can only produce low quality tests. You simply cannot make chicken salad out of chicken scat. We care enormously about item quality because we know that standardized matter and are not going away.

Which (new school) begs the question: what even is item quality? Well, that’s in the book! The industry doesn’t really have a good definition of item quality — though we do. No one offers ideas about how to think about item quality or how to improve it. All of that is in the book, and we will write abut it a little bit here jin the weeks and months ahead.

The thing that really makes RTD different is that our lens for duding item quality is the interactions of test takers and content. It is a content-focused view. We do not think that item quality can be judged without looking at the contents of the item, and the KSAs (knowledge, skill and/or ability) from the target domain that that the item is targeted. Statistical analyses based on data from field or operational testing can help, but that data comes far too late, far too expensively and never gives any insight as to why an item is performing well or poorly. We offer frameworks for thinking about that.

So, we are seriously working on the book. It is coming. In the weeks and months ahead, we will offer previews of the thinking behind various chapters, and next week we will even offer a preview of perhaps the most important chapter, The RTD Theory of the Item.

Object Lessons and North Paulding High School

August 18, 2020

I have spent the last week thinking about North Paulding High School. This is the Georgia high school that temporarily suspended a student for posting a photo of the crowded hallway there, in which almost no students were wearing masks. This is the high school whose leadership said that mask wearing was a personal choice and they could not effectively enforce a rule on it, despite having a basic dress code.

The first reason I keep thinking about North Paulding High School is that I have been there. I have sat in the principal’s office. When I began in policy analysis and program evaluation, this school district was one of my first clients. I grew to know the long-term superintendent well, and became a huge fan one of the district’s principals.

Of course, that was ages ago.

So, I was quite surprised to see Paulding in the news.

Despite my old relationship with this school district, I was immediately highly critical of their decisions, their excuses and their actions. I am on the school dress codes are usually sexist and hold girls responsible for boys immaturity bandwagon, and my issues with attempts to control teenagers — as opposed to influence to teach them — goes back much further. Of course a school can enforce a mask mandate! Not a lot of sympathy from me, right?

Except…I thought that North Paulding was getting a raw deal out of the coverage.

Sure, they deserved all the criticism they got. No doubt of that! But didn’t deserve all of the criticism. I did not think that this was the only high school in the country doing this. I did not think that it was the only high school in Georgia doing this. Heck, I did not even think that it was the only high school in Paulding County doing this.

Were there others trying to suspend students for bringing light to a situation? Well, I have seen that before. We have all seen school leaders claim that criticism from students — including embarrassing valid criticism — is “disruptive” and therefore may be barred, under the law. We have all seen basic disrespect for student rights. We have seen students try to stand up for their First Amendment right and get intimidated by the powerful into stepping back.

And I do not believe for a second that every other school in Georgia both has a mask mandate and is enforcing it. I do not believe that every other school in Georgia has figured out how high school students can safely pass from one class to another, through the day.

And yet…I believe in object lessons. This might not have been the best one, but there it was.

Sometimes, we are faced with something that did not work out. A minor snafu. And larger fuckup. A huge clusterfuck. Something. Regardless of what it is, I believe in learning from it. Really leaning from it.

Those failures, regardless of their scale, are authentic. They occurred in real contexts, in which real people were doing what they actually do. If the screw up(s) happened, they well could happen again. Whatever lead to the screw up(s)…well, if it is not addressed, why wouldn’t they happen again?

I do not agree with anything that happened at North Paulding High School. And I do not think that coverage was fair. But I think we can all learn from it. This can be an object lesson. We can all think about how and why it happened, but not as some hypothetical example written by some trainer or consultant. No, it really happened. We know what happened. It can be an object lesson.

Our school district might not get exactly to that point, but I’ll bet that your local school district has some of the same issues that led to North Paulding’s embarrassment. I am sure that mine does. My own organization can suffer from some of those issues, too.

We can look at this one example and reflect on what it might tell us about our own weaknesses and vulnerabilities. It can help us to think about where might we might make similar mistakes.

So long as we are willing to learn. So long as we are always looking to learn. The object lessons are available all around us.

Understanding the COVID Challenge for School and District Leaders

August 11, 2020

There has never been a time in my professional career — or in my life — when I have less wanted to be a school or district administrator.

You see, there is no good answer. There is no good policy. There are no good choices.

Though I have friends and colleagues and clients who work leading schools and school district, I know that I can barely imagine how hard this time is for them, whether they are trying to figure out what to do in their 2020-2021 school year.

Despite what so many preach and yell about, schools and school districts are very light on administrators and support staff. Relative to other industries, the work force is very concentrated in the line workers, the ones who do the core work of the organizations. That is, school districts are full of teachers and teacher aides. There are some school nurses and some counselors. That’s the people who work with the student. But supervisors and support staff? Other kinds of expertise? No, schools and districts are very light, in that regard.

Think about how teachers an average school principal is responsible for supervising and evaluating. Even if you split that up across the principal and the handful of assistant principals, the number of direct reports for administrators in schools is exponentially larger than what we see in virtually any other business.

I have peached for 25 years that if your team does not have slack capacity in it on a regular basis, then your team will not have the capacity it needs when a crisis hits. You can be efficient along the way and then suffer when the crisis hits, or you can be resilient then when crisis hits. You can be more efficient across time, without the major setback of the crisis, if you are willing to give up some that presumed efficiency along the way.

But we do not run our schools that way. Our schools are crowded, our teachers are under-supervised and under-reported and our school leaders are barely supervised or supported, at all. There are well over 12,000 public school districts in this country and fewer than 1000 have more than 10,000 students. To get a sense of that scale, that’s a district with more than two high schools. All the others are tiny, and are led by tiny district offices.

Where are these districts offices going to find the capacity to reinvent schooling over a summer? They can’t..

But the larger districts really can’t either, because there are no good options.

Distance learning is inferior to in-person learning, even simply judging on the traditional content in the explicit curriculum. While there are claims about efficiencies in online learning, no one really claims that it is better for individual students and there is are no credible studies that show that it is.
Distance learning widens inequality gaps. Everything that has contributed to those gaps over time is made worse in the COVID era.
School facilities — the buildings themselves — have suffered deferred maintenance for decades. HVAC systems are old and creaky. Windows may or may not open. They are not suited for a pandemic respiratory virus.
Schools are generally crowded. Not all are that over-crowed, but they are not sparely filled. Do simply do not have the space to spread students out. When there is declining enrollment, we shut down buildings or shift things around, so that we don’t have to fix up the ones we have.
State’s departments of education have always been short-staffed and under-resourced. They do not have any expertise in the areas that school and district leaders need help with.
One of our national parties has been against the US Department of Education ever since President Carter raised it to be a cabinet level post. Their presidential candidates and nominees have run on reducing or even eliminating USDOE. Other federal departments get to focus on emergency preparedness and disaster planning, but not USDOE. It lacks the resources to do that kind of work, too.
And there is no additional funding. States and municipalities are short on tax revenue and education is the largest line in the budget. Money will be cut, not added. The federal government is doing nothing to support schools and district in the COVID crisis, and we all know which party is keeping that from happening.

The thing is, other parts of our county cannot really come up with good solutions, either.

We know too little about the virus. Science takes time. Knowledge evolves and grows as get more opportunities to learn. We just don’t know.
Too many people want easy and simple answers, binaries that make decision-making easier. Children can’t get coronavirus, they want to think. Or, if they do, they cannot transmit it. They don’t get sick. But all of those are untrue.
We still lack the testing capacity in this country promised to us many months ago. We simply lack information about the state of the pandemic today, and individuals cannot get timely test results, even in the limited testing we do do.
A sizable number of people refused to take the most basic precautions to prevent spread of this disease. Among those who do wear masks, a ridiculous number lower their masks when they talk — which is the worst time to lower them. People wear masks without covering their noses.
Almost no one is acknowledging the long recovery period for many who get sick and even fewer acknowledge that there are long term health consequences, even perhaps cognitive effects.

If you had made decisions for your organization in this context, what would you? Could you come up with a for your people and customers that was safe?

But schools and distracts have it worse.

Small children squirm and move around. How can you keep them apart?
Teenagers can know the right thing to do, and yet still have trouble actually doing the right thing. That’s just where their cognitive development is. That’s just how their brains work, at their age.
Teachers almost always love their work and love their students, but they also love their own families and need to worry about their own health.
There has never been a ready supply of capable teachers to replace or augment the ones we have. We cannot find average teachers to replace the ones who are too vulnerable to work in schools and we cannot find appropriate extra staff to allow classes to be significantly smaller.
Teachers already work longer hours than most people understand, and asking them to significantly increase their workload by teaching both in-person and online is simply not possible there are not enough hours in a day.
Teaching online is a different skill than teaching in person.

There simply is no good answer. There’s no mediocre answer. Every school and district leader is trying to choose from among a bunch of really bad answers. They are thinking about children getting sick, their schools being vectors for spreading a pandemic through their communities and about their staff members dying.

We venerate fire departments and police departments who do not work nearly as hard as teachers do because when the shit gets real, they put their lives on the line for the rest of us. Cops love to talk about how much potential danger they face every day on the job. Well, we are asking teachers to face far more danger than cops ever do, and to do so every day.

In 2019, fewer than 100 officers were shot and killed in the line of duty, and we venerate them all for it. Teachers constitute approximately 1% of our nation’s population — and that does not count all the other school personnel. 1% of the deaths we have had so far in this COVID era is well over 1,000 people in just six months. Fewer than 50 cops were shot and killed in the line of duty in 2019, and we will see more than 200,000 Americans die this year in this COVID crisis.

We are asking school leaders and district leaders to come up with answers that will meet the needs of children and communities while bearing a moral and emotional burden to keep those children safe and to do right by their own people, the teachers and other personnel who work for them.

There are not good answer. There are no mediocre answers. There are only horrible answers.