Stuff+ Doesn’t Have a Team-Switching Problem

Not even going to bother with a Betteridge’s Law headline here. On top of what at this point is presumptively bad-faith discussion of their own stats (rank-order correlations on team-switchers only? Really?) BP claimed that the Stuff+ metric has a team-switching problem and spent like 15 paragraphs discussing it. I’m only going to spend two paragraphs, because it just doesn’t.

Edit 5/5/2023: went ahead and got all the same players together with the exact same weighting for everything to make sure to compare DRA- to Stuff+ and other stats completely fairly and replaced the section with one composite chart.

Using data from Fangraphs and BP, I took each season 2020-2022 with at least 100 pitches thrown (this got rid of position players, etc.) and took DRA-, Pitches, Stuff+, FIP, xFIP-, SIERA, and ERA. Because each season’s ERA was quite different, I converted ERA/SIERA/FIP to Stat/MLB_Average_ERA for that season and multiplied by 100 to make a (non-park-adjusted) “Stat-“. DRA- and xFIP- are already on that scale. I then did an IP-weighted fit of same-season Stuff+ and “ERA-” and got predicted same-season “ERA-” = 98.93 – 1.15* (Stuff+ – 100). I then took paired consecutive player-seasons and compared weighted RMSEs for year T’s stats, broken down by team-switching status (No = both seasons for the same team, Yes = played for more than one team).

RMSE T+1 “ERA-“	Non-switch	Switch	All
Stuff+	38.0	37.4	37.7
“SIERA-“	39.5	38.9	39.26
DRA-	40.0	38.1	39.29
xFIP-	40.6	40.0	40.4
“FIP-“	43.4	49.2	45.6
“ERA-“	50.8	60.6	54.6
N	588	409	997

Literally no problems here. Stuff+ does fine with team-switchers, does better than park-unadjusted “FIP-” across the board, and does much better on team-switchers than park-unadjusted “FIP-“, as expected, since park-unadjusted FIP should be the metric taking a measurable accuracy hit from a park change. And yet somehow BP is reporting the complete opposite conclusions instead: 1) that Stuff+ is fine for non-switchers but becomes near-useless for team-switchers, and 2) that its performance degrades significantly compared to park-unadjusted-FIP for team switchers. Common sense and the data clearly say otherwise. DRA- grades out roughly between SIERA and xFIP- for non-switchers predicting next season’s ERA, on par with SIERA overall, and solidly behind Stuff+. (Apologies for temporarily stating it was much worse than that).

Looking at it another way, creating an IP-weighted-RMSE-minimizing linear fit for each metric to predict next season’s “ERA-” (e.g. Year T+1 ERA- = 99 + 0.1 * (year T DRA- – 100) gives the following chart

y=mx+b	intercept	slope	RMSE	r
Stuff+ ERA-	102.42	0.79	34.16	0.29
SIERA-	103.07	0.53	34.56	0.25
DRA-	101.42	0.49	34.62	0.24
xFIP-	101.57	0.40	34.88	0.21
“FIP-“	101.13	0.21	35.14	0.17
“ERA-“	100.87	0.11	35.40	0.12
everybody the same	100.55	0.00	35.65	0.00

The intercepts are different slightly out of noise and slightly because they’re not all centered exactly identically- SIERA has the lowest average value for whatever reason. ERA predicted from Stuff+ is the clear winner again, with DRA- again between SIERA and xFIP-. Since all the metrics being fit are on the same scale (Stuff+ was transformed into ERA- as in the paragraph above), the slopes can be compared directly, and the bigger the slope, the more one point of year-T stat predicts the year T+1 ERA-. Well, almost, since the slopes to year-T ERA aren’t exactly 1, but nothing is compressed enough to change rank order (DRA- almost catches SIERA, but falls further behind Stuff+) . One point of year-T Stuff+ ERA- is worth 1.00 points of Year T ERA- and 0.8 points of year T+1 ERA-. One point of year-T DRA- is worth 1.04 points of year-T ERA- but only 0.49 points of year-T+1 ERA-. Stuff+ is much stickier. Fitting to switchers only, the Stuff+ slope is 0.66 and DRA- is 0.46. Stuff+ is still much stickier. There’s just nothing here. Stuff+ doesn’t have a big team-switching problem and points of Stuff+ ERA- are clearly worth more than points of DRA- going forward for switchers and non-switchers alike.

Share this:

Related

Leave a comment Cancel reply