The Five MTG: Arena Rankings

This ties up a few loose ends with the Mythic ranking system and explains how the other behind the scenes MMRs work.  As it turns out, there are five completely distinct rankings in Arena.  Mythic Constructed, Mythic Limited, Serious Constructed, Serious Limited, and Play (these are my names for them).

Non-Mythic

All games played in ranked (including Mythic) affect the Serious ratings, *as do the corresponding competitive events- Traditional Draft, Historic Constructed Event, etc*.  Serious ratings are conserved from month to month.   Play rating includes the Play and Brawl queues, as well as Jump In (despite Jump In being a limited event), and is also conserved month to month.  Nonsense events (e.g. the Momir and Artisan MWMs) don’t appear to be rated in any way.

The Serious and Play ratings appear to be intended to be a standard implementation of the Glicko-2 rating system except with ratings updated after every game.  I say intended because they have a gigantic bug that screws everything up, but we’ll talk about that later.  New accounts start with a rating of 1500, RD of 350, and volatility of 0.06.  These update after every game and there doesn’t seem to be any kind of decay or RD increase over time (an account that hadn’t even logged in since before rotation still had a low RD).  Bo1 and Bo3 match wins are rated the same for these.

Mythic

Only games played while in Mythic count towards the Mythic rankings, and the gist of the system is exactly as laid out in One last rating update although I have a slight formula update.  These rankings appear to come into existence when Mythic is reached each month and disappear again at the end of the month (season).

I got the bright idea that they may be using the same code to calculate Mythic changes, and this appears to be true (I say appears because I have irreconcilable differences in the 5th decimal place for all of their Glicko-2 computations that I can’t resolve with any (tau, Glicko scale constant) pair.  There’s either a small bug or some kind of rounding issue on one of our ends, but it’s really tiny regardless).  The differences are that Mythic uses a fixed RD of 60 and a fixed volatility of 0.06 (or extremely extremely close) that doesn’t change after matches and that the rating changes are multiplied by 2 for Bo3 matches.  Glicko with a fixed RD is very similar to Elo with a fixed K on matchups within 200 points of each other.

In addition, the initial Mythic rating is seeded *as a function of the Serious rating*.  [Edit 8/1/2022: The formula is: if Serious Rating >=3000, Mythic Rating = 1650.  Otherwise Mythic Rating = 1650 – ((3000 – Serious Rating) / 10).  Using Serious Rating from before you win your play-in game to Mythic, not after you win it, because reasons] That’s the one piece I never exactly had a handle on.  I knew that tanking my rating gave me easy pairings and a trash initial Mythic rating the next month, but I didn’t know how that happened.  The existence of a separate conserved Serious rating explains it all.  [Edit 8/1/22: rest of paragraph deleted since it’s no longer relevant with the formula given]

The Mythic system also had two fixed constants that previously appeared to be randomly arbitrary- the minimum win of 5.007 points and the +7.4 win/-13.02 loss when playing a non-Mythic.  Using the Glicko formula with both players having RD 60 and Volatility 0.06, the first number appears when the rating difference is restricted to a maximum of exactly 200 points. Even if you’re 400 points above your opponent, the match is rated as though you’re only 200 points higher.  The second number appears when you treat the non-Mythic player as rated exactly 100 points lower than you are (regardless of what your rating actually is) with the same RD=60.  This conclusively settles the debate as to whether or not that system/those numbers were empirically derived or rectally derived.

The Huge Bug

As I reported on Twitter last month the Serious and Play ratings have a problem (but not Mythic.. at least not this problem).  If you lose, you’re rated as though you lost to your opponent.  If you win, you’re rated as though you beat a copy of yourself (rating and RD).  And, of course, even though correcting the formula/function call is absolutely trivial, it still hasn’t been fixed after weeks.  This bug dates to at least January, almost certainly to the back-end update last year, and quite possibly back into Beta.

Glicko-2 isn’t zero-sum by design (if two players with the same rating play, the one with higher RD will gain/lose more points), but it doesn’t rapidly go batshit.  With this bug, games are now positive-sum in expectation.  When the higher-rated player wins, points are created (the higher-rated player wins more points than they should).  When the lower-rated player wins, points are destroyed (the lower-rated player wins fewer points than they should). Handwaving away some distributional things that the data show don’t matter, since the higher-rated player wins more often, points are created more than destroyed, and the entire system inflates over time.

My play rating is 5032 (and I’m not spending my life tryharding the play queue), and the median rating is supposed to be around 1500.  In other words, if the system were functioning correctly, my rating would mean that I’d be expected to only lose to the median player 1 time per tens of millions of games.  I’m going to go out on a limb and say that even if I were playing against total newbies with precons, I wouldn’t make it anywhere close to 10 million games before winding up on the business end of a Gigantosaurus.  And I encountered a player in a constructed event who had a Serious Constructed rating over 9000, so I’m nowhere near the extreme here.  It appears from several reports that the cap is exactly 10000.

In addition to inflating everybody who plays, it also lets players go infinite (up to 10000) on rating.  Because beating a copy of yourself is worth ~half as many points as a loss to a player you’re expected to beat ~100% of the time, anybody who can win more than 2/3 of their matches will increase their rating without bound.  Clearly some players (like the 9000 I played) have been doing this for awhile.

If the system were functioning properly, most people would be in a fairly narrow range.  In Mythic constructed, something like 99% of players are between 1200-2100 (this is a guess, but I’m confident those endpoints aren’t too narrow by much, if at all), and that’s with a system that artificially spreads people out by letting high-rated players win too many points and low-rated players lose too many points.  Serious Constructed would be spread out a bit more because it includes all the non-Mythics as well, but it’s not going to be that huge a gap down to the people who can at least make it out of Silver.  And while the Play queue has much wider deck-strength, the deck-strength matching, while very far from perfect, should at least make the MMR difference more about play skill than deck strength, so it also wouldn’t spread out too far.

Instead, because of the rampant inflation, the center of the distribution is at like 4000 MMR instead of ~1500.  Strong players are going infinite on the high side, and because new accounts still start at 1500, and there’s no way to make it to 4000 without winning lots of games (especially if you screwed around with precons or something for a few games at some point), there’s a constant trickle of relatively new (and some truly atrocious) accounts on the left tail spanning the thousands-of-points gap between new players and established players, and that gap only exists because of the rating bug.  It looks something like this.

The three important features are that the bulk of the distribution is off by thousands of points from where it should be, noobs start thousands of points below the bulk, and players are spread over an absurdly wide range.  The curves are obviously hand-drawn in Paint, so don’t read anything beyond that into the precise shapes.

This is why new players often make it to Mythic one time easily with functional-but-quite-underpowered decks- they make it before their Serious Constructed rating has crossed the giant gap from 1500 to the cluster where the established players reside.  Then once they’ve crossed the gap, they mostly get destroyed.  This is also why tanking the rating is so effective.  It’s possible to tank thousands of points and get all the way back below the noob entry point.  I’ve said repeatedly that my matchups after doing that were almost entirely obvious noobs and horrific decks, and now it’s clear why.

It shouldn’t be possible (without spending a metric shitton of time conceding matches, since there should be rapidly diminishing returns once you separate from the bulk) to tank far enough to do an entire Mythic climb with a high winrate without at least getting the rating back to the point where you’re bumping into the halfway competent players/decks on a regular basis, but because of the giant gap, it is.  The Serious Constructed rating system wouldn’t be entirely fixed without the bug- MMR-based pairing still means a good player is going to face tougher matchups than a bad player on the way to Mythic, and tanking would still result in some very easy matches at the start- but those effects are greatly magnified because of the artificial gap that the bug has created.

What I still don’t know

I don’t know whether or not Serious or Mythic is used for pairing Mythic.  I don’t have the data to know if Serious is used to pair ranked drafts or any other events (Traditional Draft, Standard Constructed Event, etc).  It would take a 17lands-data-dump amount of match data to conclusively show that one way or another.  AFAIK, WotC has said that it isn’t, but they’ve made incorrect statements about ratings and pairings before.  And I certainly don’t know, in so, so many different respects, how everything made it to this point.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: