Missing the forest for.. the forest

The paper A Random Forest approach to identify metrics that best predict match outcome and player ranking in the esport Rocket League got published yesterday (9/29/2021), and for a Cliff’s Notes version, it did two things: 1) Looked at 1-game statistics to predict that game’s winner and/or goal differential, and 2) Looked at 1-game statistics across several rank (MMR/ELO) stratifications to attempt to classify players into the correct rank based on those stats. The overarching theme of the paper was to identify specific areas that players could focus their training on to improve results.

For part 1, that largely involves finding “winner things” and “loser things” and the implicit assumption that choosing to do more winner things and fewer loser things will increase performance. That runs into the giant “correlation isn’t causation” issue. While the specific Rocket League details aren’t important, this kind of analysis will identify second-half QB kneeldowns as a huge winner move and having an empty net with a minute left in an NHL game as a huge loser move. Treating these as strategic directives- having your QB kneel more or refusing to pull your goalie ever- would be actively terrible and harm your chances of winning.

Those examples are so obviously ridiculous that nobody would ever take them seriously, but when the metrics don’t capture losing endgames as precisely, they can be even *more* dangerous, telling a story that’s incorrect for the same fundamental reason, but one that’s plausible enough to be believed. A common example is outrushing your opponent in the NFL being correlated to winning. We’ve seen Derrick Henry or Marshawn Lynch completely dump truck opposing defenses, and when somebody talks about outrushing leading to wins, it’s easy to think of instances like that and agree. In reality, leading teams run more and trailing teams run less, and the “signal” is much, much more from capturing leading/trailing behavior than from Marshawn going full beast mode sometimes.

If you don’t apply subject-matter knowledge to your data exploration, you’ll effectively ask bad questions that get answered by “what a losing game looks like” and not “what (actionable) choices led to losing”. That’s all well-known, if worth restating occasionally.

The more interesting part begins with the second objective. While the particular skills don’t matter, trust me that the difference in car control between top players and Diamond-ranked players is on the order of watching Simone Biles do a floor routine and watching me trip over my cat. Both involve tumbling, and that’s about where the similarity ends.

The paper identifies various mechanics and identifies rank pretty well based on those. What’s interesting is that while they can use those mechanics to tell a Diamond from a Bronze, when they tried to use those mechanics to predict the outcome of a game, they all graded out as basically worthless. While some may have suffered from adverse selection (something you do less when you’re winning), they had a pretty good selection of mechanics and they ALL sucked at predicting the winner. And, yet, beyond absolutely any doubt, the higher rank stratifications are much better at them than the lower-rank ones. WTF? How can that be?

The answer is in a sample constructed in a particularly pathological way, and it’s one that will be common among esports data sets for the foreseeable future. All of the matches are contested between players of approximately equal overall skill. The sample contains no games of Diamonds stomping Bronzes or getting crushed by Grand Champs.

The players in each match have different abilities at each of the mechanics, but the overall package always grades out similarly given that they have close enough MMR to get paired up. So if Player A is significantly stronger than player B at mechanic A to the point you’d expect it to show up, ceteris paribus, as a large winrate effect, A almost tautologically has to be worse at the other aspects, otherwise A would be significantly higher-rated than B and the pairing algorithm would have excluded that match from the sample. So the analysis comes to the conclusion that being better at mechanic A doesn’t predict winning a game. If the sample contained comparable numbers of cross-rank matches, all of the important mechanics would obviously be huge predictors of game winner/loser.

The sample being pathologically constructed led to the profoundly incorrect conclusion

Taken together, higher rank players show better control over the movement of their car and are able to play a greater proportion of their matches at high speed. However, within rank-matched matches, this does not predict match outcome.Therefore, our findings suggest that while focussing on game speed and car movement may not provide immediate benefit to the outcome within matches, these PIs are important to develop as they may facilitate one’s improvement in overall expertise over time.

even though adding or subtracting a particular ability from a player would matter *immediately*. The idea that you can work on mechanics to improve overall expertise (AKA achieving a significantly higher MMR) WITHOUT IT MANIFESTING IN MATCH RESULTS, WHICH IS WHERE MMR COMES FROM, is.. interesting. It’s trying to take two obviously true statements (Higher-ranked players play faster and with more control- quantified in the paper. Playing faster and with more control makes you better- self-evident to anybody who knows RL at all) and shoehorn a finding between them that obviously doesn’t comport.

This kind of mistake will occur over and over and over when data sets comprised of narrow-band matchmaking are analysed that way.

(It’s basically the same mistake as thinking that velocity doesn’t matter for mediocre MLB pitchers- it doesn’t correlate to a lower ERA among that group, but any individuals gaining velocity will improve ERA on average)

Share this:

Related

Leave a comment Cancel reply