This article was originally published to JamesRon.org.
As the US presidential elections wind their way towards a painful and tortuous conclusion, I’ve been thinking a lot about the difference between reporting on a poll or survey that seeks to offer a snapshot in time, as opposed to using polls, and past history, to predict what the results might be in a few days, weeks, or months.
My colleagues and I have done lots of surveys over the last eight years in Colombia, India, Mexico, Morocco, Nigeria, and the USA. In all this work, we offered estimates of how many people in a given population engaged in this or that behavior, supported this or that policy, or thought positively/negatively about a particular organization or institution. There is always a confidence interval around every one of those estimates influenced in large part by the number of responses. This allows you answer questions like, “what percentage of the population living in the Moroccan cities of Casablanca or Rabat supported this or that political party in May 2015?” Or, “what is the associated between attending mosque and believing that women’s rights are human rights?”
If I wanted to ask how many people were likely to vote for a specific Moroccan political party in the future, however, I’d have to do so much more than extrapolate from the previous work I’d done with my colleagues. I’d want to have more polls over time, so that I could track trends, and I’d want to figure out a way of predicting who would be likely to actually participate in the election, assuming that participation was voluntary.
In the US, pollsters did not do as good a job as we had hoped for the 2020 presidential elections, leading at least one New York Times op ed writer and survey expert to ask, “Can we Finally Agree to Ignore Election Forecasts?”
The biggest problem this time around, as in 2016, is that pollsters under-estimated the pro-Trump vote in a number of key states. They did so by missing roughly 3-4% of those Trump supporters who refuse to answer pollsters, for one reason or another; by under-estimating the percentage of Hispanic voters who supported Trump in Florida and Texas, among other states; and by under-estimating the number of Trump supporters who would actually vote. US pollsters, in other words, don’t have as good a grasp on the political views and behavior of non-college educated whites and Hispanics as they would like.
Political scientists will come up with more and better explanations in the coming months as they and their graduate students sit down with the 2020 polling data and analyze it carefully, state-by-state and demographic-by-demographic. They will also want to re-examine the 2018 and 2016 electoral data, and re-interpret the lessons learned then in light of the 2020 results. Then, pollsters will incorporate those lessons learned into their weighting algorithms and turnout predictions, hoping to do better in 2024.
One problem that cannot be fixed, however, is that pollsters, survey experts, and anyone else in the business of estimating the views of the non-college educated Americans who support Trump are, by definition, college educated. After all, it’s very hard to be a survey expert, pollster or commentator without at least some higher education. The non-membership of pollsters in the very same demographic group that they are trying so hard to understand is thus hard-wired into the entire polling/survey enterprise. This problem, moreover, can never, ever go away. By definition. It’s as if all the pollsters in the US were white and mono-lingual, and yet nonetheless hoped to really understand the views of people of color, or people who speak only Spanish. It’s really, really hard. Even the best and most sensitive ethnographers, such as Arlie Hoschschild, author of “Strangers in Their Own Land,” will always view non-college educated people as a foreign tribe.