Update to the MtG: Arena Mythic Ranking System

I’d noticed things weren’t right in November 2023, and according to other people, things had appeared different for a couple of months before that (I had played very little ranked). After investigating further, it appears that the core system is largely the same while the actual calculation module is now just three bugs in a trench coat.

What doesn’t seem to have changed:

There’s a cap of 1650 MMR when ranking in each month and you have to either be very new to ranked or expend effort to rank in well below that, and it’s (almost certainly) still based on your Serious Rating.
Games against other Mythics are capped at a 200 MMR difference for rating calculations regardless of how wide the difference is.
Games against non-Mythics are rated as though you’re 100 points above the non-Mythic, regardless of what your MMR is.
MMR doesn’t drift over time
Bo3 matches are worth twice as many points as Bo1
Mythic is run at a fixed Rating Deviation- even though the system is likely still trying to be Glicko-2, Mythic RD doesn’t change with played matches

What has definitely changed:

Bo1 matches against somebody 200+ points lower used to be +5 points for a win and -15.5 points for a loss for a breakeven winrate of 75.6%. Now this is +3.8 points for a win and -5.9 points for a loss for a breakeven winrate of 61.1% and many fewer points at stake.
Bo1 matches against a non-mythic used to be -13 points for a loss and +7.4 points for a win for a breakeven winrate of 63.7%. Now this is -5.4 points for a loss and +4.3 points for a win for a breakeven winrate of 55.6% and many fewer points at stake.

Even though I consider it highly likely that this output is just the result of silly bugs, this system does fix the Mythic limited rating problem, albeit in a quite dumb way. Reducing the number of points at stake in each match is also probably a good thing. On the flip side, only needing a 55.6% breakeven rate against non-Mythics in constructed is even more advantageous than before, and having a 61.1% maximum breakeven against all Mythics in constructed is downright atrocious. Approximately, to compete for #1/very high mythic in constructed before, you needed to maximize #gamesplayed * (winrate – 75.6%), and having a sustained winrate much over 75.6% in high-ish Mythic is fairly difficult. Now the formula is approximately #gamesplayed * (winrate – 61.1%), which lets mediocre-winrate players reach very high mythic just by playing a lot.

The other interesting question- to me, at least- is where the hell any of these numbers come from. The system definitely isn’t Elo, and there’s no simple way I see to bastardize Glicko-1 to give these outputs, and the 100/200 point things still seem untouched, so I figured the simplest possibility was that it was likely to be some kind of Glicko-2 modification. In any Elo-like system, a matchup between players with particular ratings can be characterized in two ways- the obvious one being the points gained or lost based on a win/loss, and the other on the ratio between the win/loss that determines the breakeven winning percentage. -15.5/+5 and -31/+10 have half/double the points at stake, but the same breakeven percentage.

In Glicko-2, the knob to adjust the breakeven percentage for a fixed rating difference is the rating deviation, which is a measure of how uncertain the ratings are. Under the old Mythic system with a RD of 60, 200 points was a 75.6% breakeven. Lowering that to 61.1% would require increasing the rating deviation to about 741.5, and given that the value for a completely unknown player is initialized at 350 (remember that number), it seemed almost impossible that 741.5 could have been coded in. And that would have given 77.2 times as many points out (454.8 vs. 5.9), which is another nonsense number. So that seemed unlikely. Getting the points given out to decrease to anything like the new low numbers required massively reducing the RD, but that only made the breakeven winrate higher than 75.6%, so that wasn’t any good either.

Basically, these numbers aren’t even close to normal Glicko-2 numbers- reducing the points at stake would require increasing the already-far-too-high breakeven percentage, and reducing the breakeven percentage would require increasing the already far-too-high points at stake. I didn’t have any great ideas- I’d noticed that the a 200-point difference under the new system was close to an 80-point difference under the new system, as far as breakeven goes, which was about a factor of 2.5. I was looking at the Glicko-2 algorithm to see if there were anything they could have plausibly screwed up to account for that, and.. Glicko-2 has a scale constant of ln(10)/400 that ratings (and RDs) are divided by. If they’d miscoded that as the base-10 logarithm instead of ln(10), that’s a factor of about 2.3. So I coded that mistake into my Glicko-2 algorithm and checked what rating deviation I would need the players to be to come out with the 61.1% breakeven.. and the answer came back.. you guessed it.. 350! (350.36). That’s an almost impossible coincidence- that the RD to get the observed breakeven % after using the incorrect logarithm would come out at almost exactly the new-player initialization level (which they actually do/did use to initialize the play and serious ratings for actual new players).

So if we assume that’s what the algorithm is doing- using the wrong logarithm and the wrong RD initialization value (Mythics used to initialize at RD 60, which is somewhat reasonable, not 350)- how do we get from there to the actual point values? That would give out far too many points- 147.6 vs 5.9- but in this case, the multiplier is almost exactly a nice round number, 25 (25.02).

I don’t know if there’s a third huge bug that mimics a divide-by-25 in rating changes, whether somebody put in a divide-by-25 to get reasonably-sized numbers in lieu of actual debugging, or if this is all part of some new system and the log(10)/350 thing is just a total coincidence, but it’s hard to imagine somebody went to the trouble of completely redesigning the Mythic rating system and ended up with.. this.

*I can’t reproduce the point values exactly using any values for scale constant, initial RD, and multiplier, but I couldn’t exactly match the values before this change either. It seems fairly likely that there’s another smaller bug somewhere in their algorithm, and/or a large bug that’s mimicking divide-by-25 somehow.

Share this:

Related

Leave a comment Cancel reply