If there is one issue that I think most needs to be understood about conducting— or consuming — research, it is understanding the representativeness of your data. The 2020 presidential election illustrates this in a number of ways.
First, the pre-election polls were off. This was also true in 2016, when the polls were off by about 3 points. It appears that this year’s polls were off by a bit more than that, but not off by huge amount. However, this error was not evenly distributed about the country. For example, the polls in Minnesota were not off by very much, but the polls in Wisconsin were off by quite a bit. (note: I am not citing specific numbers because the the election results have yet to be certified in any state.)
The major explanations going around for this concern the composition of sample. Some people cite differences between the sample and the overall population in educational attainment, while others point to trickier construct, social trust. This latter groups that weighting samples by education simply does not do enough to correct for the underrepresentation of low-trust individuals in the sample. That is, educational attainment, they suggest, is a poor proxy for social trust.
Regardless of the details, everyone agrees that the group poll respondents were different that the voting public, along a number of factors. And while high quality pollsters try to correct for this, they didn’t capture something, this time.
It is easier to correct for educational attainment, because it is easy to ask about. Educational attainment is a very easily observed variable. However, social trust is much harder to see, harder to ask about, and therefore harder to pin down. It is an unobserved variable. We can correct or control for differences in observed variables, but not unobserved variables.
One reason why large sample sizes are useful is that we hope that with a large enough group, the unobserved stuff kinda cancels out. However, in this case, that low-trust group might simple be far less likely to respond to pollsters. This undermines their sampling strategy, both in data collection and in weighting the subgroups. Obviously, they need to do a better job of turning the unobserved variable into an observed variable. Somehow.
The lesson: Unobserved variables can entirely undermine the generalizability of your results when your sample is not representative of the relevant population.
Second, it appears that the samples in the pre-election polls were different than they had been in the past. Pollster reported a greater response rate than in recent years, and the composition of those respondents might have been quite different. For example, if were white collar and office workers were newly able to work from home, and thereby have more time and/or freedom to answer than calls and take part in polls, that might favor one party over another. If lowered education service workers were not impacted by COVID and various efforts to reduce its spread, old sampling methodologies could yield significantly different samples than in the past — even the recent past.
I have not seen any mention or discussion of pollsters observing this and treating as a problem to be solved, as opposed to an opportunity to be taken advantage of. Yes, it is great when the response rate goes up, but its might not be all good.
The lesson: Adopting old methodologies and assumptions in new contexts is always tricky. Always be diligent and careful to correct for seemingly positive changes, as they may carry with them less obvious harmful changes.
Third, early returns were rather misleading in many states. This was because the voters who were counted earlier were different — as a group — than those counted later. In some states, early and mail-in votes were counted earlier, and in some states those were counted later. Everyone expected that those early and mail-in voters were more Democratic than the traditional in-person voters, though no one could be certain how different.
In this case, there were two different populations, and no good way to anticipate their differences. Successful efforts by both major parties to increase turnout made comparing the groups even harder, as no one could predict how successful each of those efforts might be.
Only fools (e.g., me, my wife, my best friend) were paying any attention to the vote counts released early in the evening, on election day. Heck, it was still pretty foolish to pay attention later in the evening. We already knew how most states would turn out, and these returns simply could not predict the final result in the close states. Whichever group was counted first simply could not be generalized to the second group. At the very least, we all had to wait for a sufficient portion of each group to be counted to predict anything.
The lesson: We cannot expect to be able to generalize results from one population to some other population that is already to known to be different in significant and relevant ways. This is especially true when the extent of those different are not known.
Fourth, the final results of many states were predictable long before the actual results got there. For example, experts know that Michigan, Pennsylvania and Georgia would end up going for Joe Biden, even when Donald Trump was far ahead in the count.
How was this possible?
History provided clear insight into the proportion of votes that from some areas of those states, relative to others.
Counties and states reported estimates of how many votes they had outstanding.
History provided a baseline for expectations about the split of the vote in different areas.
Early counting of the in-person votes and the early/mail-in votes provided further information that helped refine those historically-based expectations.
Thus, it did not require magic to predict the future. Instead, one could simply use some basic algebra to predict the count (of what had already happened). Similarly, in short order it was not that difficult to see that counts in other states — though close — would not lead to a different result.
Now, none of this was clear early in the evening. But as more results — even partial — came in from across each state and from every group of voters, the eventual results were easy to see. Those with access too the most detailed historical and emerging data could see the trends and where they would end up.
The lesson: Given sufficient data, diligence and patience, one can extrapolate from sampled data to the larger population, so long as the samples are well matched to the larger populations.
None of this is new, but gaps between predictions, partial results and full results of the 2020 presidential election have received enormous amount of discussion and attention. This can be good opportunity to think more carefully about how representativeness and efforts to get representativeness play out, in practice.