Bill has been running a series of polls with various candidates for president matched against each other, in an attempt to create a ranking system like this current one at time of writing. There’s no question that the polls can be used to create the rankings, but for the rankings to be meaningful, preferences need to have certain properties, and they don’t appear to.
Starting with Bill’s college football example, if a team is expected to be remotely competitive against Clemson (national champion), they’re also expected to crush UTEP (horrific), and if a team’s game is expected to be remotely competitive against UTEP, they’re expected to get obliterated by Clemson. There’s no concept of a team that’s 30% against both, or 60-40 vs UTEP and 40-60 vs Clemson (while Clemson is 99%+ against UTEP). The football season works as a reasonable approximation because the teams can be given a rating, on one axis, and every pairwise comparison of teams is expected to play out “close” to the rating difference.
If teams with pathological properties in the previous paragraph existed, it would be *impossible* to give them any ratings that wouldn’t produce a bunch of wildly inaccurate predictions, and at that point, it’s not clear what the numbers would represent (since they can’t be used for pairwise or group predictions without a bunch of grievous errors), or what the point of the exercise would be. Any time the pairwise matchups depend on multiple axes- something beyond the assigned rating from Bill’s system- it’s possible for the exercise to go completely haywire.
In college football, the secondary axes, the matchup-specific details, aren’t nonexistent, but they’re much smaller than the primary axis overall quality difference, so the rating method basically works. If a major secondary axis were added where every team were randomly assigned one of rock-paper-scissors at the beginning of the season, and the “winner” started ahead 14-0, the overall ratings wouldn’t be particularly different (treating the 14 points as legitimately scored for the rating calculations), but the pairwise rating-based game predictions would be utterly haywire because they’d have no idea when one of the teams was going to start with two free touchdowns, and *every* possible set of ratings is going to go totally bonkers with game predictions under that setup. That example is totally contrived of course, but the general point is that it’s *impossible* to represent a system with multiple significant axes with one rating number and have it reliably mean anything prediction-wise.
Politics has the obvious multiple axes of party affiliation and candidate preference within the party. and while party affiliation is not absolute, a significant number of people are going to order their preferences as either (almost any D > almost any R) or (almost any R > almost any D), which means that you run into the “team that’s 30% against both UTEP and Clemson” problem. There were 6 polls with Trump against 3 Democrats and he polled 26%-29% against fields headlined by everybody from Warren or Biden down to Booker or Abrams. My guess is he wouldn’t go far above 29%, if at all, even in a field headlined by Inslee. Warren and Biden smash Booker and Abrams head to head, but Trump polls the same against all of them (the only variability is when a second R is included).
There’s no single rating to give Trump that doesn’t go completely bonkers with his predictions against most of the range of D candidates. The system just doesn’t work at all with Trump. Trump is basically his own axis, even stronger than R/D alone, because he’s so polarizing. 26-29% rank him above all Ds, and 70%+ of the poll respondents rank him near last place. The system looks reasonable for Ds relative to other Ds because none of the leaders are super-polarizing relative to the others right now, but that’s not a thing that ever has to be true or stay true.
Because Trump can’t be rated properly by this method, and rating other Rs alongside Ds is also going to be super sketchy for similar reasons, either always including Trump and ignoring his number for ranking the Ds against each other or only including Ds to begin with both seem like improvements, although the latter comes with guaranteed-R voters voting on Ds, which isn’t necessarily ideal.
One thought on “Bill James and the Trump polarization problem”
Including Trump in every ballot is preferable, I think, because it keeps most Republican voters from mucking up the Democratic rankings. If the main value of the exercise is (as Bill says) to identify which second-tier Democrats have the most potential to emerge as major candidates, then you want to know how much appeal they have to Democratic voters — and only Democratic voters. Letting Trump voters play a role in these ratings will destroy their analytic value, even assuming these voters answer the polls in good faith (picking the Democrats they find least objectionable), because they will have no say in the Democratic primary process. Plus, the risk that Trump voters will use their votes to elevate Democrats who they perceive as weak opponents for Trump is real. (In fact, I suspect this kind of tactical voting explains some of the extraordinary strength for Warren in Bill’s polls, given the fairly libertarian leanings of his followers.)
Unfortunately, the multiple axis problem you identify also applies within the two parties. This isn’t readily apparent at the moment, but will become more clear as the race develops. Biden, Harris and Booker will compete for Black votes, while other candidates will get virtually no Black support. Bernie and Warren compete for a block of ideologically left voters. Some candidates will appeal almost exclusively to younger voters. All of which means that these candidates’ support in a given poll depends not just on the generic “strength” of their three opponents, but on which opponents they happen to face.
A candidate’s strength rating, Bill says, measures their share of dedicated supporters (who prefer them over every other contender). The assumption is that their support in each ballot will be proportional to that strength measure (given the strength of the opponents). But in politics this is obviously false. A polarizing candidate may have a significant number of dedicated followers (based on race, region, ideology, whatever), but a very low ceiling of support — or the reverse. Bill’s model works in college basketball because what matters is the skill of the teams, while everything else is secondary. But in politics it’s not just the candidates’ skills that determine outcomes — there are VOTING BLOCS which act independently, and their allegiance depends on the options offered.