Bill has been running a series of polls with various candidates for president matched against each other, in an attempt to create a ranking system like this current one at time of writing. There’s no question that the polls can be used to create the rankings, but for the rankings to be meaningful, preferences need to have certain properties, and they don’t appear to.
Starting with Bill’s college football example, if a team is expected to be remotely competitive against Clemson (national champion), they’re also expected to crush UTEP (horrific), and if a team’s game is expected to be remotely competitive against UTEP, they’re expected to get obliterated by Clemson. There’s no concept of a team that’s 30% against both, or 60-40 vs UTEP and 40-60 vs Clemson (while Clemson is 99%+ against UTEP). The football season works as a reasonable approximation because the teams can be given a rating, on one axis, and every pairwise comparison of teams is expected to play out “close” to the rating difference.
If teams with pathological properties in the previous paragraph existed, it would be *impossible* to give them any ratings that wouldn’t produce a bunch of wildly inaccurate predictions, and at that point, it’s not clear what the numbers would represent (since they can’t be used for pairwise or group predictions without a bunch of grievous errors), or what the point of the exercise would be. Any time the pairwise matchups depend on multiple axes- something beyond the assigned rating from Bill’s system- it’s possible for the exercise to go completely haywire.
In college football, the secondary axes, the matchup-specific details, aren’t nonexistent, but they’re much smaller than the primary axis overall quality difference, so the rating method basically works. If a major secondary axis were added where every team were randomly assigned one of rock-paper-scissors at the beginning of the season, and the “winner” started ahead 14-0, the overall ratings wouldn’t be particularly different (treating the 14 points as legitimately scored for the rating calculations), but the pairwise rating-based game predictions would be utterly haywire because they’d have no idea when one of the teams was going to start with two free touchdowns, and *every* possible set of ratings is going to go totally bonkers with game predictions under that setup. That example is totally contrived of course, but the general point is that it’s *impossible* to represent a system with multiple significant axes with one rating number and have it reliably mean anything prediction-wise.
Politics has the obvious multiple axes of party affiliation and candidate preference within the party. and while party affiliation is not absolute, a significant number of people are going to order their preferences as either (almost any D > almost any R) or (almost any R > almost any D), which means that you run into the “team that’s 30% against both UTEP and Clemson” problem. There were 6 polls with Trump against 3 Democrats and he polled 26%-29% against fields headlined by everybody from Warren or Biden down to Booker or Abrams. My guess is he wouldn’t go far above 29%, if at all, even in a field headlined by Inslee. Warren and Biden smash Booker and Abrams head to head, but Trump polls the same against all of them (the only variability is when a second R is included).
There’s no single rating to give Trump that doesn’t go completely bonkers with his predictions against most of the range of D candidates. The system just doesn’t work at all with Trump. Trump is basically his own axis, even stronger than R/D alone, because he’s so polarizing. 26-29% rank him above all Ds, and 70%+ of the poll respondents rank him near last place. The system looks reasonable for Ds relative to other Ds because none of the leaders are super-polarizing relative to the others right now, but that’s not a thing that ever has to be true or stay true.
Because Trump can’t be rated properly by this method, and rating other Rs alongside Ds is also going to be super sketchy for similar reasons, either always including Trump and ignoring his number for ranking the Ds against each other or only including Ds to begin with both seem like improvements, although the latter comes with guaranteed-R voters voting on Ds, which isn’t necessarily ideal.