The DRC+ team-switcher claim is utter statistical malpractice

Required knowledge: MUST HAVE READ/SKIMMED DRC+ really isn’t any good at predicting next year’s wOBA for team switchers and a non-technical knowledge of what a correlation coefficient means wouldn’t hurt.

In doing the research for the other post, it was baffling to me what BP could have been doing to come up with the claim that DRC+ was a revolutionary advance for team-switchers.  It became completely obvious that there was nothing particularly meaningful there with respect to switchers and that it would take a totally absurd way of looking at the data to come to a different conclusion.  With that in mind, I clicked some buttons and stumbled into figuring out what they had to be doing wrong.  One would assume that any sophisticated practitioner doing a correlation where some season pairs had 600+ PA each and other season pairs had 5 PA each would weight them differently… and one would be wrong.

I decided to check 4 simple ways of weighting the correlation- unweighted, by year T PA, by year T+1 PA, and by the harmonic mean of year T PA and year T+1 PA.

Table 1.  Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 400 PAs year T.

400+ PA Harmonic Year T PA Year T+1 PA unweighted N
switch wOBA 0.34 0.35 0.34 0.34 473
switch DRC+ 0.35 0.35 0.34 0.35 473
same wOBA 0.55 0.53 0.55 0.51 1124
same DRC+ 0.57 0.55 0.57 0.54 1124

The way to read this chart is to compare the wOBA and DRC+ correlations for each group of hitters- switch to switch (lines 1 and 2) and same to same (lines 3 and 4).  It’s obvious that wOBA should correlate much better for same than switch because it contains the entire park effect which is maintained in “same” and lost in “switch”, but DRC+ behaves the same way because DRC+ also contains a lot of park factor even though it shouldn’t

In the 400+ year T PA group, the choice of weighting method is almost completely irrelevant. DRC+ correlates marginally better across the board and it has nothing to do with switch or stay.  Let’s add group 2 to the mix and see what we get.

Table 2.  Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 100 PAs year T.

100+ PA Harmonic Year T PA Year T+1 PA unweighted N
switch wOBA 0.31 0.29 0.29 0.26 1100
switch DRC+ 0.33 0.31 0.32 0.29 1100
same wOBA 0.51 0.47 0.50 0.44 2071
same DRC+ 0.54 0.51 0.53 0.47 2071

The values change, but DRC+’s slight correlation lead doesn’t, and again, nothing is special about switchers except that they’re overall less reliable. Some of the gaps widen by a point or two, but there’s no real sign of the impending disaster when the low-PA stuff that favors DRC+ comes in.  But what a disaster there is….

Table 3.  Correlation coefficients to year T+1 wOBA% by different weighting methods, all season pairs.

1+ PA Harmonic Year T PA Year T+1 PA unweighted N
switch wOBA 0.45 0.41 0.38 0.37 1941
switch DRC+ 0.54 0.47 0.58 0.57 1941
same wOBA 0.62 0.58 0.53 0.52 3639
same DRC+ 0.67 0.62 0.66 0.66 3639

The two weightings (Harmonic and Year T) that minimize the weight of low-data garbage projections stay saner, and the two methods that don’t (year T+1 and unweighted) go bonkers and diverge by around what BP reports, If I had to guess, I have more pitchers in my sample for a slightly bigger effect and regressed DRC+% correlates a bit better.  And to repeat yet again, the effect has nothing to do with stay/switch.  It’s entirely a mirage based on flooding the sample with bunches of low-data garbage projections based on handfuls of PAs and weighting them equally to pairs of qualified seasons.

You might be thinking that that sounds crazy and wondering why I’m confident that’s what really happened.  Well, as it turns out- and I didn’t realize this until after the analysis- they actually freaking told us that’s what they did.  The caption for the chart is “Table 3: Reliability of Team-Switchers, Year 1 to Year 2 wOBA (2010-2018); Normal Pearson Correlations”.  Normal Pearson correlations are unweighted. Mystery confirmed solved.

DRC+ really isn’t any good at predicting next year’s wOBA for team switchers

UPDATE:

The decent performance I get for DRC+ projecting low-PA players is from assigning pitchers terrible DRC+s no matter how well they hit.  The rest of the post is fine, but this is all even more bullshit than I realized at the time of writing.

/UPDATE

 

Required knowledge: wOBA and DRC+

Part 2: The DRC+ team-switcher claim is utter statistical malpractice

TL;DR: raw DRC+ is a little better overall than projecting everybody to be league average, but actually worse than that for team-switchers. Best-regressed DRC+ has about a 2.5-point MAE improvement over “everybody hits league average” for switchers and a 1 point MAE improvement over best-regressed wOBA. Regressed DRC+ has a huge advantage projecting very-low-PA seasons and starts losing to regressed wOBA around the 300 PA mark.

Having already demonstrated and explained why DRC+ is structurally unfit for use in WAR/BWARP, the purpose of this next experiment was to test the claims here and here that DRC+ was something special when it came to projecting next year’s wOBA for team-switchers. It fails that test convincingly, but a little regression work gives a decent projection for players that *don’t* switch.  Unfortunately for DRC+, that prediction is only marginally better overall than the same methodology using wOBA, and only has a real advantage at the low-PA end.  Regressed wOBA rules the high-PA end.

Methodology Overview

To test their claim, and to account for leaguewide wOBA changing every year, I normalized every batter-season’s wOBA onto a 100 scale by taking (batter wOBA)/(league average wOBA for that season) * 100.  I’ll call that wOBA% from now on.  Normalization to wOBA% makes sense because many of the factors that influence leaguewide wOBA changes in the upcoming year, from the changing strike zone to the changing baseball itself are not something DRC+ tries to predict- or ever should try to predict. Using wOBA% instead of raw wOBA removes a good deal of nonsense noise at no cost.

Team switchers were not explicitly defined, but since the test sample must be composed of players who had PAs in consecutive seasons, I’m defining a team switcher as anybody who didn’t appear entirely for the same team in both years (e.g. half a season for team A followed by 1.5 seasons for team B is a team switcher).

I also normalized DRC+ to DRC+% similarly since it was coming out a bit under 100, but since DRC+ is on the run scale, I used 100*Sqrt(Max(DRC+,0)/100) to put it on the same scale as wOBA.  Seasons from 2010-2018 were used, although 2010 was only used to project 2011.  Every pair of consecutive seasons where a player had at least 1 PA was eligible.

MAEs were calculated weighted by the harmonic mean of the PAs in year T and year T+1.  Best regressions were determined the same way (using all pairs of seasons, not just switchers).  Since MAEs are calculated in wOBA%, I multiplied by 3.20 (i.e. a wOBA of .320) to put them back on the wOBA-points scale to report.

Tests and Results

The first thing I tried was simply using year T DRC+% as the projection for year T+1 wOBA% and benchmarking that against an “everybody 100 wOBA%” (LgAvg) projection and restricting it to pairs of qualified seasons (500+ PA in each).  DRC+ had an MAE of 25.1 points of wOBA against LgAvg’s 34.3 overall, but 26.5 vs 26.1 on team switchers.

In the attempt to see if the signal could be improved, I regressed year T DRC+% with league average (100 wOBA%) PAs and the minimum weighted MAE came with adding 89 average PAs.  Doing the same optimization with year T wOBA% came up with adding 332 average PAs.  For reporting purposes, I broke the players up into 3 groups

  1. 0-99 PAs in year T, which is just enough to capture all pitchers (2016 Bumgarner, 97 PA) as well as a bunch of callups and fill-ins who aren’t really MLB quality.
  2. 400+ PAs in year T, which is all full-time players and primary sides of platoons, etc.  That number is kind of arbitrary, but it’s a little over 50% of the average PA per position, assuming some PHing, and moving it around 25 PAs isn’t really going to affect the big picture analysis anyway
  3. 100-399 PAs in year T to cover everybody else.

 

This is a sample report.

Table 1.  wOBA MAEs, 400+ PA in both seasons.  LgAvg=100

Min 400 PA both seasons raw DRC+% LgAvg regd DRC+% regd wOBA% year T+1 wOBA% year T wOBA% N
all 26.0 32.3 25.1 24.5 106.6 107.5 1152
switch 28.4 26.7 26.9 25.8 103.6 105.3 303
same 25.2 34.3 24.4 24.1 107.7 108.2 849

Any PA cutoff biases the sample, but using a PA cutoff in both seasons is especially bad form because it excludes players who would have reached the cutoff in year T+1 if they hadn’t been benched for sucking.  Even with artificially tight performance constraints, the regressions are virtually useless for team switchers- only 1 point of MAE improvement for wOBA and nothing for DRC+. To avoid the extra bias problem, future results will include all (1+ PA) year T+1 seasons.

Table 2. wOBA MAEs, 400+ PA in year T, any PA year T+1.  LgAvg=100

Min 400 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 27.6 32.6 26.6 26.3 105.0 106.6 1597
switch 30.3 28.8 29.0 28.2 101.0 103.9 473
same 26.4 34.2 25.6 25.5 106.6 107.7 1124

The bias in Table 1 is apparent now.  This population is simply worse to begin with in year T (marginal hitters were more likely to suck and get benched in year T+1 and not show up in Table 1) and dropped off more to year T+1.  Back to the post topic, neither DRC+ nor wOBA are any good for switchers, wOBA is a bit ahead of DRC+, and the projection for same-team players is a clear improvement on LgAvg.

Table 3.  100-399+ PA in year T, any PA year T+1.  LgAvg=100

100-399 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 36.8 36.2 34.7 34.5 98.5 97.8 1574
switch 38.5 38.3 36.3 36.5 94.3 95.2 627
same 35.7 34.9 33.6 33.2 101.4 99.5 947

The league average errors are a good bit worse and now DRC+ and wOBA are pretty useless for everything, offering at best a 2-point improvement over LgAvg.  Also, the quality of the players here is clearly worse because….. better players get more PAs and make it into Table 2 instead.

Table 4. wOBA MAEs, 1-99+ PA in year T, any PA year T+1.  LgAvg=100

1-99 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 110.2 102.2 62.9 92.5 84.3 69.8 2409
switch 91.4 96.7 64.3 87.7 73.3 69.8 841
same 120.3 105.2 62.2 95.1 90.2 69.7 1568

And this is interesting. Garbage hitters, giant MAEs, and regressed DRC+ winning by a mile for a change.  The other interesting thing here is that the players teams keep improve *a ton* and the ones they let go keep being godawful at the plate.  Somebody should look into that in more detail.

Seeing that the 100-399 PA group at least resembled MLB-quality hitters, albeit not the good ones, and that the 1-99 PA group was an abomination at the plate (it did include all the pitchers), I wondered what would happen if I cheated a little and tried to optimize on the 100+ PA group instead of everybody.  That group looks like

Table 5.  wOBA MAEs, 100+ PA in year T, any PA year T+1.  LgAvg=100

Min 100 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 30.4 33.7 29.1 28.8 102.7 103.9 3171
switch 33.3 32.2 31.6 31.2 98.6 100.8 1100
same 28.9 34.5 27.7 27.5 104.9 105.6 2071

Again, useless for switchers, solid improvement for the ones who stayed.  Based on this, I decided to reoptimize based on a LgAvg of 103 using only players with 100+ PA year T just to see what would happen.

Trying a different league average

This is starting to look down the rabbit hole of regressing to talent (more PA is a proxy for more talent as we’ve seen) instead of to pure league average, but let’s see what happens.  The regression amounts came out to 243 added PA for DRC+ and 410 added PA for wOBA.  Doing that came up with

Table 6.  wOBA MAEs, 100+ PA in year T, any PA year T+1.  LgAvg=103

Min 100 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 30.4 33.2 28.7 29.1 102.7 103.9 3171
switch 33.3 33.7 31.3 31.9 98.6 100.8 1100
same 28.9 32.9 27.2 27.6 104.9 105.6 2071

Well, that’s not an auspicious start, marginally helping the stayers (the switchers were closer to 100, so regressing towards 103 isn’t any help).  Let’s see if there’s any benefit in either group individually.

Table 7.  wOBA MAEs, 100-399 PA in year T, any PA year T+1.  LgAvg=103

100-399 PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 36.8 38.7 34.5 35.6 98.5 97.8 1574
switch 38.5 41.9 36.7 38.0 94.3 95.2 627
same 35.7 36.6 33.0 33.9 101.4 99.5 947

Well, that was a disaster for these guys. The top line using 100 LgAvg was 36.8 / 36.2 / 34.7 / 34.5 before, and regressing them further from their talent shockingly didn’t do any favors.  Was this made up for by the full-time players?

Table 8.  wOBA MAEs, 400+ PA in year T, any PA year T+1.  LgAvg=103

400+ PA year T raw DRC+% LgAvg regd DRC+% regd wOBA% year t+1 wOBA% year T wOBA% N
all 27.6 30.8 26.1 26.3 105.0 106.6 1597
switch 30.3 29.1 28.3 28.5 101.0 103.9 473
same 26.4 31.5 25.2 25.4 106.6 107.7 1124

Not really.  This is a super marginal improvement over the previous top line of 27.6 / 32.6 / 26.6 / 26.3.  Only the LgAvg projection really benefits at all, and that’s not what we’re interested in.  Changing league average a little and optimizing over only MLB-quality hitters doesn’t seem to really accomplish anything great for the DRC+ or wOBA regressions.

Conclusion

There’s little-to-nothing to BP’s claim that DRC+ is something special for team switchers.  Raw DRC+% is worse than league average, and the MAE for best-regressed DRC+ is about 2 points better than league average overall, and that entire benefit is from the low-PA end.  It projects team-switching full-time players worse than assuming they’re league average.  However, for the really low-PA players, it is more accurate raw and much more accurate regressed than league average.

Likewise there’s also absolutely nothing to the claim that DRC+ is a significant improvement over wOBA for predicting year T+1 wOBA for switchers- the gap is actually *smaller* for switchers.  1.1 points of MAE for switchers and 1.5 points for stayers.  The best regression of DRC+ absolutely does shine in the very-low-PA group, but it’s also not good in the full-time player category.  Regressed DRC+ and regressed wOBA actually do make fairly decent, much better than league average predictions for full-time players who *stay*, for whatever an untested model fit to in-sample data is worth.

C’mon Man- Baseball Prospectus DRC+ Edition

Required knowledge: A couple of “advanced” baseball stats.  If you know BABIP, wRC+, and WAR, you shouldn’t have any trouble here.  If you know box score stats, you should be able to get the gist.

Baseball Prospectus recently introduced its Deserved Runs Created offensive metric that purports to isolate player contribution to PA outcomes instead of just tallying up the PA outcomes, and they’re using that number as an offensive input into their version of WAR.  On top of that, they’re pushing out articles trying to retcon the 2012 Trout vs. Cabrera “debate” in favor of Cabrera and trying to give Graig Nettles 15 more wins out of thin air. They appear to be quite serious and all-in on this concept as a more accurate measure of value.  It’s not.

The exact workings of the model are opaque, but there’s enough description of the basic concept and the gigantic biases are so obvious that I feel comfortable describing it in broad strokes. Instead of measuring actual PA outcomes (like OPS/wOBA/wRC+/etc) or being a competitive forecasting system (Steamer/ZIPS/PECOTA), it’s effectively just a shitty forecast based on one hitter-season of data at a time****.

It weights the more reliable (K/BB/HR) components more and the less reliable (BABIP) components less like projections do, but because it’s wearing blinders and can’t see more than one season at a time, it NEVER FUCKING LEARNS**** that some players really do have outlier BABIP skill and keeps over-regressing them year after year.  This is methodologically fatal.  It’s impossible to salvage a one-year-of-stats-regressed framework.  It might work as a career thing, but then year X WAR would change based on year X+1 performance.

Addendum for clarity: If DRC+ regresses each season as though that’s all the information it knows, then adds those regressed seasons up to determine career value, that is *NOT* the same as correctly regressing the total career.  If, for example, BABIP skill got regressed 50% each year, then DRC+ would effectively regress the final career value 50% as well (as the result of adding up 50%-regressed seasons), even though the proper regression after 8000 PAs is much, much less.  This is why the entire DRC+ concept and the other similarly constructed regressed-season BP metrics are broken beyond all repair.  /addendum

 

****The description is vague enough that it might actually use multiple years and slowly learn over a player’s career, but it definitely doesn’t understand that a career of outlier skill means that the outlier skill (likely) existed the whole time it was presenting, so the general problem of over-regressing year after year would still apply, just more to the earlier years. Trout has 7 full years and he’s still being underrated by 18, 18, and 11 points the last 3 years compared to wRC+ and 17 points over his whole career.

DRC+ loves good hitters with terrible BABIPs and particularly ones with bad BABIPs and lots of HRs.  Graig Nettles and his career .245 +/- .005 BABIP / 390 HRs looks great to DRC+ (120 vs 111 wRC+, +14.7 wins at the plate), as do Mark McGwire (164 vs 157, +8.5 wins), Harmon Killebrew (150 vs 142, +16.2 wins), Ernie Banks (129 vs 118, +20.8 wins), etc.  Guys who beat the hell out of the ball and run average-ish BABIPs are rated similarly to wRC+, Barry Bonds (175 vs 173), Hank Aaron (150 vs 153), Willie Mays (150 vs 154), Albert Pujols (147 vs 146), etc.

The flip side of that is that DRC+ really, really hates low-ISO/high BABIP quality hitters.  It underrates Tony Gwynn (119 vs 132, -12.9 wins) because it can’t figure out that the 8-time batting champ can hit. In addition, it hates Roberto Alomar (110 vs 118, -10.4 wins) Derek Jeter (105 vs 119, -17.9 wins), Rod Carew (112 v 132, -18.7 wins), etc.  This is simply absurd.

C’mon man.

Are Rocket League viewership numbers fraudulently high?

Maybe.  Quite possibly.  Something strange is going on.

For the TL;DR crowd

  1. Season 6 EU playoff viewership (Sunday October 14, 2018) was abominably low, off by 10,000-20,000 viewers or up to a third of the regular audience, far worse than any other broadcast this season.
  2. Season 6 NA regional playoffs, held the day before (Saturday, October 13, 2018) were exactly in line with seasonal numbers.
  3. Regional playoff viewership was always fine-to-good for the other 7 Season 3-Season 6 regional playoff broadcasts
  4. There was an unnatural 15,000ish viewer bump late in the broadcast that can only realistically be a big host or major viewbot fraud
  5. Psyonix won’t comment at all and I can’t find other evidence of a large host
  6. Viewership of major events was trending way down coming into season 6

 

Background

The Rocket League Championship Series (RLCS) is a biannual competition featuring 8 teams from North America (NA), 8 teams from Europe (EU), and 8 teams from Oceania (OCE) each playing an online single-round-robin season over 5 weeks followed by an online regional playoff.  The top 4 teams from NA and EU and the top 2 teams from OCE then meet in person to crown the world champion for that season.  Psyonix, the maker of Rocket League, runs the NA and EU competitions and broadcasts every game on its Twitch stream.  A different entity is responsible for OCE and broadcasts those games on its own Twitch stream instead. We won’t worry about the OCE broadcasts.

 

Viewer counts increased significantly between seasons 3 and 4, slightly between 4 and 5, and were down about 20% in the current season 6, as the following graphs from esc.watch, an esports viewership tracker, show.  I have personally spot-checked esc.watch’s Rocket League charts over the past year and have always found them accurate.

 

The focus of this post will be the large decline in the season 6 EU playoff.  Note that season 3 NA has a slight decline in playoff viewership while every other playoff broadcast has been up or at least flat.  File that away for later. A couple of other easily  explainable anomalies need to be addressed first.  In season 3 (green), there was a week off between week 3 (April 2, 2017) and week 4 (April 16, 2017).  That resulted in lower viewership in week 4 in both NA and EU that rebounded to “normal” in the following 2 weeks of consecutive play.  Season 5 week 5 in EU had a large technical problem.  A power outage near the Psyonix studio cut the broadcast short after 2 games and the remaining 4 games were rescheduled for 12 PM Eastern on a Thursday in place of the regular 12 PM Eastern on Sunday.  That timeslot is obviously terrible in comparison for NA viewers and the low viewer count is quite understandable.  Season 4 week 5 in EU had a different technical problem.  The broadcast was fine, but twitch.tv did not send out notifications that the broadcast was starting and reportedly didn’t even show the Rocket League channel as being online.  Given the nature of viewership, which will be discussed next, it’s also quite understandable for that number to be in the toilet.

 

A typical RLCS broadcast accumulates most of its viewers over its first three matches like this (another reason that the EU S5 Week 5 broadcast being cut off after 2 games killed the viewership).

ctg0rf5

The viewership accumulates this way because it’s comprised of hardcore viewers, those who are ready for the start of the broadcast every week, or at least the notification for it, and those who happen upon it in progress.  People who follow the RL Twitch stream and connect to Twitch will see that it’s online, and people who play Rocket League while a broadcast is in progress are greeted with a giant flashing “ESPORTS LIVE NOW” button on the main menu that they can click to watch.

EU Regionals Anomalies

The graphs for the 12 broadcasts this season follow.  Don’t pay attention to the actual numbers yet, just the shapes.  And in particular the last graph on the bottom right, the EU playoffs.

 

 

There are two obvious things of note.  First, the glitch on the left is the very end of EU Week 5 (the bottom left image above) and not actually part of the playoff broadcast.  That was confirmed directly by esc.watch.  Second.. well…

S6euwhat

That’s a near-instantaneous (under 5 minute) jump of almost 15,000 viewers, and there’s no precedent of such a thing in any other broadcast this season.  It’s clearly not part of the natural accumulation process.

EU Regionals Low Viewership

The other interesting thing about this broadcast is that it was the least viewed broadcast of the season by a country mile.

RL S6 Viewership by match slot

Or in slightly less cluttered form, for each region, the averages of the regular season (weeks 1-5), the playoffs, and the difference between the playoff viewership and the regular season viewership.

RL S6 averages

Yikes? To summarize so far, we have a mystery with the following facts

  1. Season 6 EU playoff viewership (Sunday October 14, 2018) was abominably low, off by 10-20 thousand viewers or up to a third of the regular audience, far worse than any other broadcast this season.
  2. Season 6 NA regional playoffs, held the day before (Saturday, October 13, 2018) were exactly in line with seasonal numbers.
  3. Regional playoff viewership was always fine-to-good for the other 7 Season 3-Season 6 regional playoff broadcasts
  4. There was an unnatural 15,000ish viewer bump late in the broadcast.

Potential Causes for Low Viewership

In investigating the low viewership, I was unable to uncover any evidence of technical difficulties.  I was watching the broadcast and using Twitch and both worked fine.  I found no reports of issues, Twitch-wise or internet-wise, in the relevant reddit thread while gripes about the Twitch issues in season 4 were plentiful.  The graph of viewer count was accurate.  I personally observed the extremely low viewer counts throughout as well as the high viewer count at the end.  Unfortunately I wasn’t paying attention to the count as the 15,000 viewers showed up late.  Technical issues don’t seem to be the explanation here.

The second possibility is that something external was drawing would-be viewers away.  There wasn’t any news in the world that day that would have diverted a large number of viewers.  Call of Duty: Black Ops 4 was released worldwide on October 12, but if that were the cause, it should have sunk the NA playoff viewership on October 13 as well.  Similarly, Rocket League itself was offering a double-XP weekend, giving out more in-game rewards for playing, but that was also active on Saturday during the NA playoffs.  There were other non-Rocket League esports events, as there are many weekends, but there didn’t appear to be anything that could uniquely sink the entire Sunday broadcast or the first 90% of it.  This doesn’t appear to be it either.

The third possibility is that the broadcast itself was thought to be uniquely unappealing to tune into, presumably because Dignitas, the two-time defending world championship roster that just went 7-0 in round robin play, was thought to be a lock to win.  This explanation falls short for several reasons.  First, there’s no evidence that Dignitas is bad for viewership at all.  None of their matches have seen strange declines and the last match of the regular season was an almost meaningless Dignitas match and it had the highest viewership of the week. The idea that people would watch to see if Dignitas could go 7-0, then disappear in droves as Dig tried to become the first NA/EU team to go 7-0 and win the regional playoffs… well, that’s just strange.

The format of the regional championship also undercuts that explanation.  It’s a 6-team single elimination where the top 4 qualify for the world championship.  The top 2 seeds get first-round byes and automatically qualify which means the first two matches- #3 vs #6 and #4 vs #5- are critical.  The winner goes to the world championships and the loser is done for the season.  Even if people didn’t want to watch the rest of the playoffs, those matchups should have been compelling.  The second match of the day involved the most anticipated rookie in RLCS history by far, ScrubKilla, and his wildly inconsistent Renault Vitality team playing for a LAN spot.  This should have been a must-watch series, and by one metric it was.  Not counting the first match of the day, which always accumulates lots of viewers, Vitality-PSG had a higher accumulation of viewers (+17k) than any other match had all season, almost 50% clear of second place.  People talked about ScrubKilla and Vitality all season.  Viewers appeared to tune in in numbers specifically to not miss this match.  The final number shouldn’t have been hot garbage, and yet it still was.  I’m at a complete loss for a legitimate explanation for the overall low viewership.

Potential Causes of the 15,000 Viewer Bump

Switching focus to the ~15,000 viewer bump, there’s one legitimate explanation- a large stream from another game/scene hosted them..  That would bump the viewer count up quickly, but I could find no evidence of that actually happening.  I couldn’t find a mention on twitter. I watched the relevant portion of the broadcast replay, focusing on the Twitch chat (bless my soul), and there weren’t any mentions of a host, nor any newbie-like comments, which seems a bit unlikely.  In addition, the attrition rate would have had to have been extremely low because viewership also went up a tad after the spike.  Rocket League being the autoplay stream on the Twitch homepage was suggested as a possibility, but after observing those streams for a few days, none I saw were accumulating viewers at even a meaningful fraction of the necessary rate.

If the 15,000 viewer bump isn’t natural viewer accumulation, and isn’t a legitimate Twitch host, that leaves fraud.  Psyonix randomly gives away in-game items to players who watch the stream and have their Twitch accounts linked.  This is absolutely a boon to viewership, and can be misleading in a way because players can and do load the stream, mute it, and pay no attention to it just for the chance of getting a drop. In this way, the percentage of the viewer count who are watching the stream is lower than it otherwise would be, but this is all out in the open, and it does also further a legitimate purpose of trying to entice players to become esports viewers.  Where there are giveaways, there are people trying to exploit giveaways, but the likelihood of large-scale fraud here seems very small to me.  The expected value in terms of reselling items of having a linked account watch a stream for 6 hours is about 25-50 cents (USD), and obtaining that value across thousands of accounts involves selling numerous small-ticket (.50-$3) items that nobody wants multiples of.  While I have no doubt that some people sign up a few accounts and have Twitch open in multiple browsers, any person or group capable of controlling enough computers/IPs to generate 15,000 concurrent fraudulent Twitch views should be able to make a hell of a lot more money doing almost anything else with them…

Such as selling their services to a company/streamer that wants to inflate its viewer count to bring in more advertising/sponsorship revenue or to just appear to be more popular than it actually is.  This is called viewbotting, and it’s not a rare occurrence.  If Psyonix had an arrangement for around 15,000 viewbots and forgot to turn them on before the last match, or never turned them on and got a 15,000 viewer host near the end, this would explain everything.  Here’s the graph from earlier with 15,000 viewers added through the whole broadcast instead of the very end.

RLS6 +15k

That would be much more consistent with the regular season.  I obviously have no affirmative proof that Psyonix is committing viewbot fraud, but if they were committing viewbot fraud, this is *exactly* what it would look like if they screwed up for a week, and I’m at a complete loss for legitimate explanations for the low viewership that wouldn’t also have sunk the NA playoffs.

In summary, the evidence here is consistent with several main hypotheses of varying plausibility.

  1. There is a legitimate, as yet undiscovered reason for the overall low viewership and they got a big host of around 15,000 near the end.
  2. Psyonix is viewbotting around 15,000 fakes and forgot to turn them on until the last match of the day.
  3. Psyonix is viewbotting around 15,000 fakes, forgot to turn them on at all that day, and coincidentally got a big host of around 15,000 near the end.

Psyonix’s Refusal to Comment

I contacted Psyonix a fourth time with the article up to this point and another request for comment.  That’s private messages to Murty Shah and Cory Lanier, the two main public faces of their esports program, an email to Psyonix’s general PR contact address, and an email to Psyonix’s esports contact address, and I’ve received no replies.  In addition, they didn’t comment in reddit threads where the low viewership and the big bump were discussed.  The lack of response has to be considered deliberate at this point.  Let’s think about what that means.

Hypothetically, if Psyonix knows the viewers are from a host, would they be willing to tell the world about it?  I would think so. I’m not a social media guru, but if somebody from a different game added 15k viewers to my 48k viewer broadcast, I would want to say thank you, and I’d want to say it publicly to attempt to engage their followers a little more.  They shouldn’t do that for small hosts because that would bring a barrage of useless attention-seekers, and I’m not sure exactly where the acknowledgment line should be drawn, but a 15,000 viewer host seems safely big enough at this point.  Furthermore, there’s no obvious benefit to attempting to conceal that a host happened.  It would already be known to the people watching the other stream, it’s obvious to everybody who looked at the viewer graph that something happened, and it’s inconceivable to me that a channel from a different scene wanting to expose its viewers to Rocket League could be construed negatively.  Now watch it turn out that Psyonix is slowrolling and waiting for the article to go live to say who the host was, but I have to work with what I have so far. If they do decide to announce who the host was and it’s legit, then we’re just back to the mystery of the low viewership in general.

Hypothetically, if Psyonix knows the viewers are fake, would they be willing to deny that they were hosted or willing to lie about who hosted them?  The latter would be a huge mistake because it would be discovered in no time flat, and the former is straight-up admitting that the numbers for that broadcast are bogus, and by inference, that the numbers for every other broadcast during the season are almost certainly bogus. They have to stay silent in this case and hope it blows over.  Twitch itself should have an active interest in investigating this broadcast if there wasn’t a big host.

Reasons for Viewbotting

The cleanest explanation for not commenting is fraud, which invites the question of whether or not viewbotting would make any sense.  As a moral matter, I have absolutely no idea if the relevant people working there are the type who could do such a thing.  As a business matter, it’s plausible.  Before this season, viewership trends started out ok and then took a straight-line path to dumpster fire.  All tournament pairs below are roughly equivalent across seasons.  Broadcasts were about the same length and Fan Rewards were active for all referenced broadcasts.  This is what Psyonix was looking at for average viewers as the numbers rolled in this year

  1. April 22.  RLCS Season 5: 68.2k, +15% over season 4
  2. May 6.  Promotion/Relegation Tournament.  44.5k, +9% over season 4
  3. June 10.  World Championships.  101.8k, -5% from season 4 and -14% from season 3
  4. August 11.  UORL 2 final qualifiers. 15.1k.  esc.watch doesn’t have comps for last year, and timeslots are wonky, but this is not a good number at all for top players playing with rewards active.  The peak viewership was only 29.5k, presumably on one of the four good weekend timeslots.
  5. August 26.  Universal Open 2.  29.1k, -35% from last year
  6. September 2.  Season 6 play-ins 42.0k -46% from last season.

If I were in charge of the esport or somebody whose job/livelihood depended on the continued success of the esport, I would have been very nervous going into season 6. It’s not hard to see the appeal of buying some “insurance viewers” in that climate. As it was, season 6 viewership was officially down 20% across the board, and the real number is much worse if this season was being viewbotted.  To repeat one last time, I have no definitive proof that viewbotting took place.  It is only a hypothesis that explains viewer stats and Psyonix’s behavior.

Conclusion

I hope, and all fans of Rocket League probably join me in hoping that there was a legitimate reason for the low viewership and that Psyonix is simply being obnoxious refusing to acknowledge a large host.  Anything else would be a huge story and a crippling blow to the future of the esport.  If Twitch or Psyonix comment, the article will be updated to reflect that. Many thanks to esc.watch for providing the charts for this article.