How 538 got Trump wrong

May 25th, 2016 at 4:00 pm by David Farrar

Nate Silver writes on how he got Donald Trump so wrong (as almost everyone did). He notes:

But it’s not how it worked for those skeptical forecasts about Trump’s chance of becoming the Republican nominee. Despite the lack of a model, we put his chances in percentage terms on a number of occasions. In order of appearance — I may be missing a couple of instances — we put them at 2 percent (in August), 5 percent (in September), 6 percent (in November), around 7 percent (in early December), and 12 percent to 13 percent (in early January).

Silver notes five things:

  1. Our early forecasts of Trump’s nomination chances weren’t based on a statistical model, which may have been most of the problem.
  2. Trump’s nomination is just one event, and that makes it hard to judge the accuracy of a probabilistic forecast.
  3. The historical evidence clearly suggested that Trump was an underdog, but the sample size probably wasn’t large enough to assign him quite so low a probability of winning.
  4. Trump’s nomination is potentially a point in favor of “polls-only” as opposed to “fundamentals” models.
  5. There’s a danger in hindsight bias, and in overcorrecting after an unexpected event such as Trump’s nomination.

The interesting thing is that if you just looked at the polls, then you should have concluded Trump would win. He basically led in every poll for six months. But everyone found reasons to argue why the polls would change. 538 for example places great store on endorsements. And endorsements have been a good predictor in previous elections, but as Silver notes the sample size of previous elections is not great.

So one lesson from this is not to ignore the polls. They’re not always right, but polls vs assumptions, polls tend to win out.

Another source of info I look to is the prediction markets. At the moment they have Clinton at 66% likely to win and Trump 32%.

Nate Silver

November 15th, 2012 at 7:00 am by David Farrar

The Sydney Morning Herald reports:

The political emperors have no clothes, stripped bare by a big-data wizard named Nate Silver who showed dispassionate maths was more reliable than pundit intuition and cherry-picked polls.

Silver, 34, a statistician who previously predicted the career trajectories of baseball players, accurately tipped 49 out of 50 US states (with the 50th, Florida, highly likely to be accurate as well as Obama is ahead with 97 per cent of the votes counted) and most Senate contests.

As right-wing pundits attacked him and his “voodoo statistics” for failing to see that the election was on a knife edge – and in the case of some conservative wingnuts, for being openly gay and “effeminate” – Silver held his nerve and for the entire election cycle maintained that the data always pointed to an easy Obama victory. …

Even after Obama’s dismal first debate performance, Silver’s probability of Obama winning never dipped below 61.1 per cent, rising to more than 90 per cent on election day.

I am a big fan of both Silver’s analytic skills, and his demeanour while under fire. He deserves a lot of credit.

It is worth pointing out though that all the major polling aggregation sites did very well as reported by Cnet:

But Silver wasn’t the only one to do exceptionally well in the prediction department. In fact, each of the five aggregators that CNET surveyed yesterday — FiveThirtyEight, TPM PollTracker, HuffPost Pollster, the RealClearPolitics Average, and the Princeton Election Consortium — successfully called the election for Obama, and save for TPM PollTracker and RealClearPolitics handing Florida to Romney, the aggregators were spot on across the board when it came to picking swing state victors.

So if you listened to the polls rather than the pundits, you were likely to be correct. Why then is Silver the new political celebrity rather than say Mark Blumenthal who does HuffPost Pollster?

I think it is partly because Silver was attacked by several prominent pundits before the election. Those attacks backfired by giving him not just accuracy but vindication.

The other reason is that Silver does a bit more than just aggregate and weight the polls. His extra tweeks may not make a huge difference but they are seen as useful by many.

In addition to picking the winner in all 50 states — besting his 49 out of 50 slate in 2008 — Silver was also the closest among the aggregators to picking the two candidates’ popular vote percentages. All told, he missed Obama’s total of 50.8 percent by just four-tenths of a percentage point (50.4) and Romney’s 48 percent by just three-tenths of a point (48.3) for an average miss of just 0.35 percentage points. HuffPo Pollster and RealClearPolitics tied for second with an average miss of 0.85 points.

This may change a bit as the final votes come in. It is worth noting also that Silver didn’t have a 100% accuracy rate with calling Senate races. Again this takes nothing away from his highly deserved reputation – just that even his model is not infallible  The strength of his model, as I see it, is that it learns from the past.

So what does Silver do to predict who wins. His exact methodology is secret (he has said he may reveal more over time) but he has detailed what he does for Senate races. My summary of it is:

  1. Average the polls for that state
  2. Give more recent polls a higher weight using an exponential decay formula
  3. Weight by sample size so larger sample polls have more weight
  4. Assign an accuracy rating to each pollster and weight those historically more accurate, higher. Exclude polls from very dodgy pollsters or polls released by parties. Note that many other polling aggregators also do steps 1 to 4. What is unique to Silver tends to be the later steps.
  5.  Adjust the result based on the national trend, so if nationwide one party has dropped say 5% in one week, assume it applies to that state also.
  6. Adjust the result based on observed “house effects” for pollsters. So if one pollster consistently has Democrats 2% higher than they get, then take 2% off their poll.
  7. Adjust polls of registered voters as if they were of likely voters, based on the normal difference between such polls (Republicans do better with likely voters).
  8. Do a regression analysis of the state based on their partisan voting index, their party identification, donations to candidates, incumbency status, approval ratings for incumbents, and previous offices a candidate has been elected to
  9. Add the results of the regression analysis to the weighted average of polls, as if it is a poll.
  10. Do an error calculation
  11. Stimulate the election and report how often one candidate beats the other over multiple simulations

So Silver has a very sophisticated model. I think for presidential elections he also uses economic data such as GDP growth and unemployment rates. Over time as more and more data is gathered, his model should remain accurate or become even more accurate.

There will be times when it will be wrong, just as the polls sometimes get it wrong. No model can compensate if the election is very volatile and large numbers of voters change their mind or are undecided in the final few days. Events will always matter.