Back to news

On UOL, the probability in election polls

Roberto Imbuzeiro é pesquisador titular do IMPA da área de probabilidade 

Reproduction from UOL

Reporting by: Stefhanie Piovezan

In the US, in 2016, The New York Times conducted an experiment: it sent questionnaires containing responses collected from likely Florida voters to five renowned election researchers for analysis.

And the result was revealing. Each person arrived at a different conclusion, despite being presented with the same 867 questionnaires.

The reason for this difference is not, as many suggest in every election, both in Brazil and abroad, that some institute intentionally manipulates the result. This variation stems from the difference in methodology and criteria of each institute.

Read also: The crisis of the foundations of mathematics
Thematic programs can receive up to R$ 200,000.
IMPA opens competition to hire researcher.

Two good researchers, examining the same data, can find very different results. But more than that: it is unlikely that the researchers will analyze the same data – each has different criteria for selecting respondents.

"One institute's sample might consist of 10 young women and 10 adult men, while another institute's sample might have 10 adult women and 10 young men. The composition is the same: 10 women and 10 men in the sex category and 10 young people and 10 adults in the age category. But the samples are completely different. They could, therefore, produce opposite results," Carlos Alberto de Bragança Pereira, professor at the Institute of Mathematics and USP, told UOL.

This shows that evaluating electoral responses is much more complex than many people think: it's not enough to compare how many answered A with how many answered B and see who is preferred. The calculation, in fact, depends on who was interviewed (the sample) and what was done with that result.

Therefore, there is subjectivity within election polls.

"The method doesn't cease to be scientific just because it involves subjectivities. These need to be well established and explicit for the decision-maker," says Helio Migon, professor emeritus at the Institute of Mathematics of UFRJ.

Defining the sample: Imagine you ask 10 people in your neighborhood which candidate they intend to vote for. The result will most likely not represent the Brazilian reality. This is because there are several factors to take into account. For example:

Did you interview more men than women? Are there more men than women in Brazil? Does being a man or a woman matter when deciding how to vote? How much? Did you interview people from other cities? From other states? What is the weight of your city in the Brazilian vote count?

This brings us to a challenge before even conducting the research: how to make the sample representative? How many people to interview? Which ones?

"Each institute would like to select people completely at random within each stratum of the electorate. For example, among all Brazilian men with a high school education, I would like to choose some completely at random," explains Roberto Imbuzeiro Oliveira, a researcher at IMPA (Institute of Pure and Applied Mathematics) . These samples in which participants are selected completely at random, by lottery, are called probabilistic samples.

The problem is that it would be necessary to speak with all the people selected – if the researcher selects some of them, or fails to speak with some of them, there is already a kind of distortion in the sample. And there is no list of all the people who are part of this group, nor any way to guarantee that it will be possible to contact everyone.

Working with probabilistic sampling is therefore more difficult, expensive, and time-consuming. What do research institutes do? In a large city, for example, they randomly select public locations and send interviewers to speak with people passing by. "If the sample is not very representative – for example, if significantly more men than women appear – the institute can correct this effect by assigning different weights to the respondents," he states.

Thus, there is a mixed methodology: quotas based on the Census to define the groups that should be interviewed, which statistics calls non-probabilistic selection, in addition to a probabilistic selection of factors such as the cities where the interviews take place. "That's where the difference lies: the electorate universe is the same, but each institute approaches this universe differently and collects data in its own way."

Academic impasse

Mixed methodologies generate debate among statisticians. Some experts believe that a completely probabilistic sample would be necessary, with random selections to define cities, census tracts, and voters, while others advocate a mix between pre-defined and probabilistic selections.

Helio Migon, professor emeritus at the Institute of Mathematics of UFRJ (Federal University of Rio de Janeiro), also mentions models with defined interview locations.

"I prefer a sampling design that chooses interview locations based on past election results and selects voters, whether through quotas or not, but makes inferences based on a model that reflects these choices. The idea is to select those locations that are most similar, in terms of election results, to those obtained in a past election. They are 'miniatures' of the past, and it is assumed that these are stable over time," he argues.

Interpreting the result

Once the results are collected (i.e., the polls answered: who are you going to vote for?), they need to be interpreted – which, as the research conducted by The New York Times showed, leads to different results depending on the criteria.

What variables should be taken into account? What types of characteristics should the researcher consider? Race, sex, and age are very standard. But what about region, party affiliation, or education level?

In the US, there is still a difference between institutes that take the Census into account, and others that take the voter registration list into account, when reflecting the population that the sample intends to represent.

In short: each institute chooses different characteristics to balance.

Reliability

The choices in defining the sample are directly related to the margin of error and reliability. "The number of voters to be interviewed on the street or at home, that is, the sample size, takes into account the stipulated sampling error, which is the famous margin of error, the reliability we desire, and the population of voters," explains Migon.

Thus, when an institute reports that a survey has a 95% confidence level, for example, this means that, out of every 100 samples taken, in 95 the true value of the proportion of voting intentions for a certain candidate will be within the margin of error, and five may indicate unexpected voting intentions.

"The margin of error is a way to quantify: with this sample size, I can still be wrong, but I guarantee that the chance of me being wrong by more than the margin is less than 5%," says Oliveira.

According to him and Migon, there is subjectivity in the methodology used in the research, but this does not invalidate it. "The sampling design involves a fixed reliability factor, which is subjective. Why 90% and not 95% reliability? It's a choice," argues the professor from UFRJ.

"The methodology is not perfect and can introduce bias, that is, it can favor certain groups to the detriment of others. Bias here is a technical term that does not indicate malice or ill intent; it is simply impossible to have the ideal way of choosing. In any case, each institute tries to get as close as possible to that situation in which people would be chosen completely at random and with equal chances," says Oliveira.

*This report also uses information published in the article "In the US, 4 firms studied the same responses and arrived at different predictions for the elections" (http://noticias.uol.com.br/internacional/ultimas-noticias/the-new-yorktimes/2018/10/05/margem-de-erro-nao-demostra-o-real-potencial-de-equivoco-daspesquisas-veja-experimento.htm) published in The New York Times.

Read also:
In bars or schools, projects popularize mathematics.
Brazil wins 14 medals at the May Olympics.