Not even going to bother with a Betteridge’s Law headline here. On top of what at this point is presumptively bad-faith discussion of their own stats (rank-order correlations on team-switchers only? Really?) BP claimed that the Stuff+ metric has a team-switching problem and spent like 15 paragraphs discussing it. I’m only going to spend two paragraphs, because it just doesn’t.
Edit 5/5/2023: went ahead and got all the same players together with the exact same weighting for everything to make sure to compare DRA- to Stuff+ and other stats completely fairly and replaced the section with one composite chart.
Using data from Fangraphs and BP, I took each season 2020-2022 with at least 100 pitches thrown (this got rid of position players, etc.) and took DRA-, Pitches, Stuff+, FIP, xFIP-, SIERA, and ERA. Because each season’s ERA was quite different, I converted ERA/SIERA/FIP to Stat/MLB_Average_ERA for that season and multiplied by 100 to make a (non-park-adjusted) “Stat-“. DRA- and xFIP- are already on that scale. I then did an IP-weighted fit of same-season Stuff+ and “ERA-” and got predicted same-season “ERA-” = 98.93 – 1.15* (Stuff+ – 100). I then took paired consecutive player-seasons and compared weighted RMSEs for year T’s stats, broken down by team-switching status (No = both seasons for the same team, Yes = played for more than one team).
|RMSE T+1 “ERA-“||Non-switch||Switch||All|
Literally no problems here. Stuff+ does fine with team-switchers, does better than park-unadjusted “FIP-” across the board, and does much better on team-switchers than park-unadjusted “FIP-“, as expected, since park-unadjusted FIP should be the metric taking a measurable accuracy hit from a park change. And yet somehow BP is reporting the complete opposite conclusions instead: 1) that Stuff+ is fine for non-switchers but becomes near-useless for team-switchers, and 2) that its performance degrades significantly compared to park-unadjusted-FIP for team switchers. Common sense and the data clearly say otherwise. DRA- grades out roughly between SIERA and xFIP- for non-switchers predicting next season’s ERA, on par with SIERA overall, and solidly behind Stuff+. (Apologies for temporarily stating it was much worse than that).
Looking at it another way, creating an IP-weighted-RMSE-minimizing linear fit for each metric to predict next season’s “ERA-” (e.g. Year T+1 ERA- = 99 + 0.1 * (year T DRA- – 100) gives the following chart
|everybody the same||100.55||0.00||35.65||0.00|
The intercepts are different slightly out of noise and slightly because they’re not all centered exactly identically- SIERA has the lowest average value for whatever reason. ERA predicted from Stuff+ is the clear winner again, with DRA- again between SIERA and xFIP-. Since all the metrics being fit are on the same scale (Stuff+ was transformed into ERA- as in the paragraph above), the slopes can be compared directly, and the bigger the slope, the more one point of year-T stat predicts the year T+1 ERA-. Well, almost, since the slopes to year-T ERA aren’t exactly 1, but nothing is compressed enough to change rank order (DRA- almost catches SIERA, but falls further behind Stuff+) . One point of year-T Stuff+ ERA- is worth 1.00 points of Year T ERA- and 0.8 points of year T+1 ERA-. One point of year-T DRA- is worth 1.04 points of year-T ERA- but only 0.49 points of year-T+1 ERA-. Stuff+ is much stickier. Fitting to switchers only, the Stuff+ slope is 0.66 and DRA- is 0.46. Stuff+ is still much stickier. There’s just nothing here. Stuff+ doesn’t have a big team-switching problem and points of Stuff+ ERA- are clearly worth more than points of DRA- going forward for switchers and non-switchers alike.