## Outlier polls

June 25th, 2012 at 11:00 am by David FarrarEvery so often you may see a poll which is commonly regarded as an outlier poll, ie well outside where other polls are. A reader has asked me to comment on what do pollsters do, when they get one.

A recent Bloomberg poll in the US had Obama 13% ahead of Romney. The average of the public polls has the gap at around 1%. Mark Blumenthal writes:

The most likely possibility is that this poll simply represents a statistical outlier. Yes, with a 3 percent margin of error, its Obama advantage of 53 to 40 percent is significantly different than the low single-digit lead suggested by the polling averages. However, that margin of error assumes a 95 percent level of confidence, which in simpler language means that one poll estimate in 20 will fall outside the margin of error by chance alone.

This is worth remembering. One in 20 polls will fall outside the quoted margin of error. One of the challenges to being a pollster is what to do if you think one of your polls may be an outlier.

Bloomberg have done a statement on their poll, talking about what they did and didn’t do. This is a good idea, and they look at theories on why they got the result they did, such as:

Maybe you had a higher-educated respondent pool and they tend to like Obama.Maybe we did. Of all the theories, this one holds some water. The 2008 exit poll shows 24 percent with a high school education or less, compared to our 20 percent among likely voters; the 2008 electorate had 31 percent with some college, and we had 23 percent. In 2008, 28 percent of voters had a college degree and 17 percent more had some postgraduate education; we had 34 percent with a college degree and 21 percent with some postgraduate exposure. However, in our poll, every education subgroup votes for Obama over Romney.

We played around with the data to test whether our findings would have changed had our education distribution looked more like the 2008 exit poll. By “played around,” I mean we created a 52-cell weight variable accounting for age, race, and education. The presidential contest becomes a 10-point race: 51 percent for Obama and 41 percent for Romney. It is still a double-digit lead for Obama and would likely have created as much stir as our 13-point lead.

In the end they conclude:

In the end.We will soon know whether this poll is, in fact, an outlier. Potentially, this poll caught the electorate when the wind was at Barack Obama’s back for a brief moment in time.

If a pollster gets a response which is “way out there”, there are a number of things you can do.

- Check the raw data. Audit a higher proportion of calls than normal.
- Check the numbers add up. Sometimes human error can sneak in.
- Look at your sample – is any group seriously under or over represented? Maybe all your Auckland responses came from Parnell?
- Play with your weightings. As Bloomberg did, see if different weightings will materially impact it.
- Check your questions. Was the script different to last time.
- Check what was happening when you were in the field. If an All Black game was on one night, then your polled no rugby fans, and that could make a difference. Was there a final of a popular TV show?
- Look at your cross-tabs. If one indicator moves (say party vote) but not another (say right vs wrong direction) then that may imply an unusual sample (the one in 20)
- In extreme cases, you can always redo the poll. But it is a dangerous thing to throw a poll away just because you assume it is an outlier. In fact, you may be first to pick up a change or trend.

Ultimately, as Bloomberg says, it will become apparent in time if the poll was an outlier or not.

June 25th, 2012 at 11:31 am

And one in how many are that far outside the quoted margin of error?

(although I guess a one-in-500 poll has to happen soon or later)

June 25th, 2012 at 11:51 am

A thirteen point lead to Obama is getting very close to fiction. And putting it down to selecting educated people is even more

“out there”.

June 25th, 2012 at 12:31 pm

The latest polls listed at RealClearPolitics is here:

http://www.realclearpolitics.com/epolls/2012/president/us/general_election_romney_vs_obama-1171.html

The latest Rasmussen Poll has Romney by 5% whilst Gallup has it more or less a tie with Obama ahead by 1%.

At the end of the day it’ll all come down to what happens with the US economy and the 3 debates may play a role if one candidate has a clear win either by a brilliant performance or by the other guy dropping a clanger.

DPF: Do you know how accurate historically have Gallup been in presidential elections? They poll a large number of people…..

[DPF: Gallup are pretty good. Rasmussen not so much]

June 25th, 2012 at 2:55 pm

Graeme

The 95% confidence interval is for a poll result within three standard deviations (reported as being +/- 3%)

For four standard deviations (ie +/- 4%), the confidence interval is 99.25% (or 1 in 400 poll results).

A poll result of 7 standard deviations (ie what Wikipedia bothers to calculate) is nearly 1 in 400 billion.

If Obama is actually ahead by 1%, then a lead of 12% represents 11 standard deviations, which is getting to be in the territory of abolsute bullshit (Horizon Polls, I’m looking at you) or perhaps the somebody somewhere fucked up zone. It’s the polling equivalent of winning the lottery.

June 25th, 2012 at 5:29 pm

@metcalph: 95% is +/- 1.96 standard deviations (assumption of normal approximation for the binomial, assuming np and n(1-p) is suitably large (eg bigger than 5 so that the kurtosis is close enough to zero)). All it means is 95% of similar (representative) samples will have a 95% chance of surrounding the true proportion. Thus, you have a 95% chance that the current CI contains the true proportion.

You need to be especially careful when you’re assessing the proportion that vote for A vs B when it’s essentially a binary choice: Everyone that switches from A to B causes a swing of 2, thus the two proportions are not independent. Your analysis of 12% being 11 standard deviations away from 1% is incorrect. To do it properly you need to know how the lead between Obama and Romney is distributed on the polls first. If it’s a purely binary choice, then basically the standard deviation doubles (distribution of p – (1-p) is dist of 2p-1). Thus a difference of 11% on a poll with a 3% margin of error where the true difference is 1% is just less than 4 standard deviations, not quite as outrageous as first thought. The effect drops off a bit if there’s more than a binary choice, but is there nonetheless.

The key (as the pollers appear to be aware of, though the interpreters aren’t) is it all hinges on the sample being representative (i.e. no non-sampling error). I think we can all agree that pretty much any poll of the electorate is not representative – that doesn’t mean that they’re useless though!

June 26th, 2012 at 9:29 am

Or all your South Island results are from Picton 😉