Tuesday, September 27, 2016

"You come at the king, you best not miss."

So I was recently accused by a someone (whom I hold in the highest esteem) of being “jealous of Nate Silver”.  While this is undoubtedly true, I want to really dig into the roots of my jealousy.  Trauma buried this deep needs to be excavated and examined, before one can really begin to heal.

You see, I’ve been very disturbed by Silver’s methodology this year, the way it gyrates almost in sync with the polls, when what it is SUPPOSED TO DO is aggregate the information in the polls and give us a stabler and truer picture of where the race is (and maybe where it’s headed).

My suspicion has been that all Nate has really constructed is a glorified poll average, with all his other bells and whistles providing marginal added value (if any at all).

So with that in mind, I converted two graphics, one from the Huffington Post pollster site (formerly pollster.com) and one from Silver’s site and looked at the data from the last 117 days or so.  The Huffpost data I rasterized the default smoothed national poll averages for HRC and DJT.  From 538, the polls-only win probabilities.

I then constructed two non-linear models from the HuffPost data.  Model 1 performs a multiple regression of Trump’s 538 win probability (you only need his, because if you know his, you know hers, assuming no third parties can win) against the Trump national poll average from HuffPost, and the square of the poll average.  Model 2 adds the Clinton poll average and the square of this poll average to the mix.  The resulting plots can be seen below (note graph inverted due to direction convention on JPEG indices.  If it bothers you, think of it as HRC's win prob).


As you can see, Model 1 (Polynomial Model Trump Poll Average in the legend) approximates the 538 curve reasonably well.  Really very well.  It captures 71% of the variability in the 538 Trump win probability.  Model 2 1 (Polynomial Model Both Poll Average in the legend) is even better.  It captures 87% of the variability in the 538 Trump win probability.  The lower graph shows just how well this model tracks the 538 probabilities.
Plotting the output of the model against the 538 probabilities, you can see a strongly linear relationship.



There is a little structure to the noise, but it’s nothing to write home about, and I could probably reduce it further by adding a few more terms to the model.  But capturing almost 90% of the variability, well, you are probably going to be right in ~90% of elections.  Unless a razor thin year 2000 scenario comes up, you should be just fine looking at the national poll averages to see who will win.

So…why am I jealous of Nate Silver?  Because he’s got one hell of a scam going on right now.

I coulda been a contender.