Back to news

Opinion polls: Mathematics and reality

Reproduction from the IMPA Science & Mathematics blog, from O Globo, coordinated by Claudio Landim.

Roberto Imbuzeiro Oliveira is a researcher at IMPA.

The year 1936 was a presidential election year in the United States. On one side, Democrat Franklin Delano Roosevelt was trying to get re-elected and continue the fight against the Great Depression. On the other, Republican Alfred Landon criticized the president for the New Deal policies.

Two attempts to predict the outcome of that election went down in history. One was by the magazine "Literary Digest." It conducted a poll with more than two million of its readers. Even so, it was way off in its prediction.

Read also: Learning mathematics can and should be enjoyable.
The OBMEP exam brought together students from more than 54,000 schools.
Mission Gaia: Where are the stars, what are they like, and how do they dance?

The other attempt was by a smaller company. It only interviewed a few thousand individuals, but correctly concluded that FDR would win the election by a landslide. The company, which later became known as Gallup, was the first to demonstrate the value of scientific methodology in opinion polls.

Today, each election season brings its flood of polls. Following their numbers has become part of our lives. At the same time, like everything involving politics, polls generate distrust. Are the numbers meant to inform or manipulate? In the United States, the recent election of Donald Trump, which was considered improbable, gave the impression that there might be something wrong or even dishonest in the field.

My goal here is not to investigate research institutes; I leave that to the detectives on duty. What I will do is, firstly, explain the mathematics involved in polling. Secondly, we will see that even the best polls can be wrong for a number of reasons. Finally, we will discuss what may have gone wrong in the polls before Trump's election. We will see that there are indeed reasons to point out errors, but there is also much to admire.

First, some good news. Behind the research lies mathematics as solid and consistent as that of the Pythagorean Theorem. What's curious is that this mathematics tells us completely certain things about probabilities.

To explain this better, let's start with the basics. As you know, opinion polls never interview all Brazilian voters. What they do is select a few individuals – a sample – and ask only them who they intend to vote for.

How can a sample accurately represent the whole? The key idea is that if we choose respondents randomly, without favoring or excluding anyone, then we will likely see a fairly accurate picture of the electorate. For example, imagine that 50% of voters intend to cast blank ballots. The electorate is very large, with millions of people. However, if we interview only 1,000 people at random, our sample will likely have between 47% and 53% of people voting blank. I can even guarantee the following: the chance of seeing a number outside this range is less than 5%.

You might wonder where these numbers come from. This is where probability comes in, the area of mathematics that studies chance. With it, we can very precisely quantify the likely error of an election poll. More precisely: if the statistician wants to give a result that is correct with a 95% chance, he uses probability to calculate its margin of error.

There are many variations on the process of selecting random samples. Sometimes it's important to stratify the population, that is, to divide it into groups according to income, education, and gender. This makes it possible to understand the opinions of different groups and (perhaps) reduce overall research errors. In any case, the sample within each group is always chosen at random. This central idea is almost poetic: to know the opinion of the people, don't talk to carefully chosen people (your friends, for example). Just go out there, look for people, and let anyone talk.

Statistics compiles, classifies, and interprets data. Probability studies chance as a mathematical object. Together, they form a formidable pair. However, like any couple, they sometimes have difficult conversations.

No matter how much Statistics tries, Probability cannot guarantee certainty in the results. In fact, every survey has a chance of being wrong outside the "margin of error". For example, above we said that the result will be within the margin with a 95% chance. Therefore, the chance of error outside the margin is 5%, or 1 in 20. This means that, on average, one in twenty surveys will have errors greater than the margin of error! This is natural, inevitable, and predicted with all the rigor of Mathematics.