A new look at the TTOP, plus a mystery

I had the bright idea to look at the familiarity vs. fatigue TTOP debate, which has MGL on the familiarity side and Pizza Cutter on the fatigue side, by measuring performance based on the number of pitches the batter had seen previously and the number of pitches that the pitcher had thrown to other players in between the PAs in question. After all, a fatigue effect on the TTOP shouldn’t be from “fatigue”, but “relative change in fatigue”, and that seemed like a cleaner line of inquiry than just total pitch count. Not a perfect one, but one that should pick up a signal if it’s there. Then I realized MGL had already done the first part of that experiment, which I’d somehow completely forgotten even though I’d read that article and the followup around the time they came out. Oh well. It never hurts to redo the occasional analysis to make sure conclusions still hold true.

I found a baseline 15 point PA1-PA2 increase as well as another 15 point PA2-PA3 increase. I didn’t bother looking at PA4+ because the samples were tiny and usage is clearly changing. In news that should be surprising to absolutely nobody reading this, PAs given to starters are on the decline overall and the number of PA4+ is absolutely imploding lately.

Season	Total PAs	1st TTO	2nd	3rd	4th	5th
2008	116960	42614	40249	30731	3359	7
2009	116963	42628	40186	30736	3406	7
2010	119130	42621	40457	32058	3990	4
2011	119462	42588	40458	32333	4080	3
2012	116637	42506	40336	30741	3050	4
2013	116872	42570	40422	31026	2851	3
2014	117325	42612	40618	31235	2856	4
2015	114797	42658	40245	29580	2314	0
2016	112480	42461	40128	28193	1698	0
2017	110195	42478	39912	26476	1329	0
2018	106051	42146	38797	24057	1051	0

Looking specifically at PA2 based on the number of pitches seen in PA1, I found a more muted effect than MGL did using 2008-2018 data with pitcher-batters and IBB/sac-bunt PAs removed. My data set consisted of (game,starter,batter,pa1,pa2,pa3) rows where the batter had to face the starter at least twice, the batter wasn’t the pitcher, and any ibb/sac bunt PA in the first three trips disqualified the row (pitch counts do include pitches to non-qualified rows where relevant). For a first pass, that seemed less reliant on individual batter-pitcher projections than allowing each set of PAs to be biased by crap hitters sac-bunting and good hitters getting IBBd would have been.

Pitches in PA 1	wOBA in PA 2	Expected**	n
1	0.338	0.336	39832
2	0.341	0.335	69761
3	0.336	0.335	79342
4	0.334	0.335	82847
5	0.339	0.337	74786
6	0.347	0.338	51374
7+	0.349	0.337	36713

MGL found a 15 point bonus for seeing 5+ pitches the first time up (on top of the baseline 10 he found), but I only get about an 11 point bonus on 6+ pitches and 3 points of that are from increased batter/worse pitcher quality (“Expected” is just a batter/pitcher quality measure, not an actual 2nd TTO prediction). The SD of each bucket is on the order of .002, so it’s extremely likely that this effect is real, and also likely that it’s legitimately smaller than it was in MGL’s dataset, assuming I’m using a similar enough sampling/exclusion method, which I think I am. It’s not clear to me that that has to be an actual familiarity effect, because I would naively expect to see more of a monotonic increase throughout the number of pitches seen instead of the J-curve, but the buckets have just enough noise that the J-curve might simply be an artifact anyway, and short PAs are an odd animal in their own right as we’ll see later.

Doing the new part of the analysis, looking at the wOBA difference in PA2-PA1 based on the number of intervening pitches to other batters, I wasn’t sure I was going to find much fatigue evidence early in the game, but as it turns out, the relationship is clear and huge.

intervening pitches	wOBA PA2-PA1	vs base .015 TTOP	n
<=20	-0.021	-0.036	9476
21	-0.005	-0.020	5983
22	-0.005	-0.020	8652
23	0.004	-0.011	11945
24	0.000	-0.015	15683
25	0.004	-0.011	19592
26	0.001	-0.014	23057
27	0.005	-0.010	26504
28	0.009	-0.006	29690
29	0.015	0.000	31453
30	0.021	0.006	32356
31	0.014	-0.001	32250
32	0.020	0.005	30723
33	0.018	0.003	28390
34	0.027	0.012	25745
35	0.028	0.013	22407
36	0.023	0.008	18860
37	0.030	0.015	15429
38	0.025	0.010	12420
39	0.012	-0.003	9558
40	0.045	0.030	7362
41-42	0.032	0.017	9241
43+	0.027	0.012	7879

That’s a monster effect, 2 points of TTOP wOBA per intervening pitch with an unmistakable trend. Jackpot. Hareeb’s a genius. That’s big enough that it should result in actionable game situations all the time. Let’s look at it in terms of actual 2nd time wOBAs (quality-adjusted).

intervening pitches	PA2 wOBA (adj)
<=20	0.339
21	0.346
22	0.343
23	0.344
24	0.340
25	0.341
26	0.339
27	0.339
28	0.337
29	0.340
30	0.341
31	0.338
32	0.347
33	0.336
34	0.345
35	0.344
36	0.336
37	0.340
38	0.328
39	0.335
40	0.340
41-42	0.338
43+	0.344

Wait what??!?!? Those look almost the same everywhere. If you look closely, the higher-pitch-count PA2 wOBAs even average out to be a tad (4-5 points) *lower* than the low-pitch-count ones (and the same for PA1-PA3, though that needs a closer look). If I didn’t screw anything up, that can only mean..

intervening pitches	PA1 wOBA (adj)
<=20	0.361
21	0.351
22	0.348
23	0.339
24	0.340
25	0.336
26	0.338
27	0.335
28	0.327
29	0.325
30	0.320
31	0.325
32	0.326
33	0.319
34	0.318
35	0.316
36	0.312
37	0.311
38	0.303
39	0.323
40	0.295
41-42	0.306
43+	0.318

Yup. The number of intervening pitches TO OTHER BATTERS between somebody’s first and second PA has a monster “effect on” the PA1 wOBA. I started hand-checking more rows of pitch counts and PA results, you name it. I couldn’t believe this was possibly real. I asked one of my friends to verify that for me, and he did, and I mentioned the “effect” to Tango and he also observed the same pattern. This is actually real. It also works the same way between PA2 and PA3. I couldn’t keep looking at other TTOP stuff with this staring me in the face, so the rest of this post is going down this rabbit hole showing my path to figuring out what was going on. If you want to stop here and try to work it out for yourself, or just think about it for awhile before reading on, I thought it was an interesting puzzle.

It’s conventional sabermetric wisdom that the box-score-level outcome of one PA doesn’t impart giant predictive effects, but let’s make sure that still holds up.

Reached base safely in PA1	PA2 wOBA (adj)	Batter quality	Pitcher quality
Yes	0.348	0.338	0.339
No	0.336	0.334	0.336

That’s a 12 point effect, but 7 of it is immediately explained by talent differences, and given the plethora of other factors I didn’t control for, all of which will also skew hitter-friendly like the batter and pitcher quality did, there’s just nothing of any significance here. Maybe the effect is shorter-term than that?

Reached base safely in PA1	Next batter wOBA (adj)	Next batter quality	Pitcher quality
Yes	0.330	0.337	0.339
No	0.323	0.335	0.336

A 7 point effect where 5 is immediately explained by talent. Also nothing here. Maybe there’s some effect on intervening pitch count somehow?

Reached base safely in PA1	Average intervening pitches	intervening wOBA (adj)
Yes	30.58	0.3282
No	30.85	0.3276

Barely, and the intervening batters don’t even hit quite as well as expected given that we know the average pitcher is 3 points worse in the Yes group. Alrighty then. There’s a big “effect” from intervening pitch count on PA1 wOBA, but PA1 wOBA has minimal to no effect on intervening pitch count, intervening wOBA, PA2 wOBA, or the very next hitter’s wOBA. That’s… something.

In another curious note to this effect,

intervening pitches	intervening wOBA (adj)
<=20	0.381
21	0.373
22	0.363
23	0.358
24	0.351
25	0.344
26	0.343
27	0.335
28	0.333
29	0.328
30	0.324
31	0.322
32	0.319
33	0.316
34	0.316
35	0.312
36	0.310
37	0.310
38	0.307
39	0.311
40	0.308
41-42	0.309
43+	0.311

Another monster correlation, but that one has a much simpler explanation: short PAs show better results for hitters

Pitches in PA	wOBA (adj)	n
1	0.401	133230
2	0.383	195614
3	0.317	215141
4	0.293	220169
5	0.313	198238
6	0.328	133841
7	0.347	57396
8+	0.369	37135

Throw a bunch of shorter PAs together and you get the higher aggregate wOBA seen in the table right above this one. It seems like the PA length effect has to be a key. Maybe there’s a difference in the next batter’s pitch distribution depending on PA1?

Pitches in PA	Fraction of PA after reached base	Fraction of PA after out	wOBA after reached base	wOBA after out	OBP after reached base	OBP after out
1	0.109	0.089	0.394	0.402	0.362	0.359
2	0.164	0.158	0.375	0.376	0.348	0.343
3	0.183	0.182	0.308	0.303	0.284	0.278
4	0.186	0.191	0.289	0.276	0.299	0.281
5	0.165	0.174	0.311	0.301	0.339	0.323
6	0.112	0.120	0.323	0.32	0.367	0.360
7	0.049	0.052	0.346	0.339	0.393	0.386
8+	0.032	0.034	0.356	0.36	0.401	0.405

Now we’re cooking with gas. That’s a huge likelihood ratio difference for 1-pitch PAs, and using our PA1 OBP of about .324, we’d expect to see a PA1 OBP of .370 given a 1-pitch PA followup, which is exactly what we get, and the longer PAs are more weighted to previous outs because of the odds ratio favoring outs after we get to 4 pitches.

Next PA pitches	This PA1 OBP	This PA1 wOBA
1	0.370	0.373
2	0.333	0.332
3	0.326	0.325
4	0.319	0.318
5	0.313	0.313
6	0.311	0.313
7	0.314	0.310
8	0.313	0.309

It seems like this should be a big cause of the observed effect. I used the 2nd/6th and 3rd/7th columns from two tables up to create a process that would “play through” the next 8 PAs starting after an out or a successful PA, deciding on the number of pitches and then whether it was an out or not based on the average values. Then I calculated the expected OBP for PA1 based on the likelihood ratios of each number of total pitches to happen (the same way I got .370 from the odds ratio for a 1-pitch followup PA).

As it turns out, that effect alone can reproduce the shape and a little over half the spread

intervening pitches	PA1 OBP (adj)	model PA1 OBP
<=20	0.366	0.340
21	0.351	0.336
22	0.349	0.329
23	0.339	0.338
24	0.343	0.332
25	0.336	0.328
26	0.335	0.327
27	0.335	0.328
28	0.328	0.328
29	0.325	0.326
30	0.320	0.326
31	0.324	0.323
32	0.324	0.323
33	0.318	0.321
34	0.318	0.324
35	0.317	0.323
36	0.312	0.317
37	0.313	0.318
38	0.307	0.320
39	0.320	0.310
40	0.300	0.317
41-42	0.308	0.309
43+	0.320	0.317

and that simple model is deficient at a number of things (correlations longer than 1 pa, different batters, base-out states, etc). I don’t know everything that’s causing the effect, but I have a good chunk of it, and that reverse pitch count selection bias isn’t something I’ve ever seen mentioned before. This is also a caution to any kind of analysis involving pitch counts to be very careful to avoid walking into this effect.

2/05/19 DRC+ update- some partial fixes, some new problems

BP released an update to DRC+ yesterday purporting to fix/improve several issues that have been raised on this blog. One thing didn’t change at all though- DRC+ still isn’t a hitting metric. It still assigns pitchers artificially low values no matter how well they hit, and the areas of superior projection (where actually true) are largely driven by this. The update claimed two real areas of improvement.

Valuation

The first is in treating outlier players. As discussed in C’mon Man- Baseball Prospectus DRC+ Edition by treating player seasons individually and regressing them, instead of treating careers, DRC+ will continually fail to realize that outliers are really outliers. Their fix is, roughly, to make a prior distribution based on all player performances in surrounding years, and hopefully not regress the outliers as much because it realizes something like them might actually exist. That mitigates the problem a little, sometimes, but it’s still an essentially random fix. Some cases previously mentioned look better, and others, like Don Kessinger vs. Larry Bowa still don’t make any sense at all. They’re very similar offensive players, in the same league, overlapping in most of their careers, and yet Kessinger gets wRC-DRC bumped from 72 to 80 while Bowa only goes from 70 to 72, even though Kessinger was *more* TTO-based.

To their credit- or at least to credit their self-awareness, they seem to know that their metric is not reliable at its core for valuation. Jonathan Judge says

“As always, you should remember that, over the course of a career, a player’s raw stats—even for something like batting average—tend to be much more informative than they are for individual seasons. If a hitter consistently seems to exceed what DRC+ expects for them, at some point, you should feel free to prefer, or at least further account for, the different raw results.”

Roughly translated, “Regressed 1-year performance is a better estimation of talent that 1-year raw performance, but ignoring the rest of a player’s career and re-estimating talent 1 year at a time can cause discrepancies, and if it does, trust the career numbers more.” I have no argument with that. The question remains how BP will actually use the stat- if we get more fluff pieces on DRC+ outliers who are obviously just the kind career discrepancies Judge and I talked about, that’s bad. If it is mainly used to de-luck balls in play for players who haven’t demonstrated that they deserve much outlier consideration, that’s basically fine and definitely not the dumbest thing I’ve seen lately.

This, on the other hand, well might be.

NAME	YEAR	PA	BB	DRC+	DRC+ SD	DRAA
Mark Melancon	2011	1	1	-3	2	-0.1
Dan Runzler	2011	1	1	-17	2	-0.1
Matt Guerrier	2011	1	1	-13	2	-0.1
Santiago Casilla	2011	1	1	-12	2	-0.1
Josh Stinson	2011	1	1	-15	2	-0.1
Jose Veras	2011	1	1	-14	2	-0.1
Javy Guerra	2011	1	1	-15	2	-0.1
Joey Gathright	2011	1	1	81	1	0

Not just the blatant cheating (Gathright is the only position player on the list), but the DRC+ SDs make no sense. Based on one identical PA, DRC+ claims that there’s a 1 in hundreds of thousands chance that Runzler is a better hitter than Melancon and also assigns negative runs to a walk because a pitcher drew it. The DRC+ SDs were pure nonsense before, but now they’re a new kind of nonsense. These players ranged from 9-31 SD in the previous iteration of DRC+, and while the low end of that was still certainly too low, SDs of 1-2 are beyond absurd, and the fact that they’re that low *only for players with almost no PAs* is a huge red flag that something inside the black box is terribly wrong. Tango recently explored the SD of wRC+/WAR and found that the SDs should be similar for most players with the same number of PA. DRC+ SDs done correctly could legitimately show up as slightly lower, because they’re the SD of a regressed stat, but that’s with an emphasis on slightly. Not SDs of 1 or 2 for anybody, and not lower SDs for pitchers and part-time players who aren’t close to a season full of PAs.

Park Adjustments

I’d observed before that DRC+ still contains a lot of park factor and they’ve taken steps to address this. They adjusted Colorado hitters more in this iteration while saying there wasn’t anything wrong with their previous park factors. I’m not sure exactly how that makes sense, unless they just weren’t correcting for park factor before, but they claim to be park-isolated now and show a regression against their park factors to prove it. Of course the key word in that claim is THEIR park factors. I reran the numbers from the linked post with the new DRC+s, and while they have made an improvement, they’re still correlated to both Fangraphs park factor and my surrounding-years park factor estimate at the r=0.17-0.18 level, with all that entails (still overrating Rockies hitters, for one, just not by as much).

DRC+ and Team Wins

A reader saw a television piece on DRC+, googled and found this site, and asked me a simple question: how does a DRC+ value correlate to a win? I answered that privately, but it occurred to me that team W-L record was a simple way to test DRC+’s claim of superior descriptiveness without having to rely on its false claim of being park-adjusted.

I used seasons from 2010-2018, with all stats below adjusted for year and league- i.e. the 2018 Braves are compared to the 2018 NL average. Calculations were done with runs/game and win% since not all seasons were 162 games.

Team metric	r^2 to team winning %
Run Differential	0.88
wRC+	0.47
Runs Scored	0.43
OBP	0.38
wOBA	0.37
OPS	0.36
DRC+	0.35

Run differential is cheating of course, since it’s the only one on the list that knows about runs allowed, but it does show that at the seasonal level, scoring runs and not allowing them is the overwhelming driver of W-L record and that properly matching RS to RA- i.e. not losing 5 1-run games and winning a 5-run game to “balance out”- is a distant second.

Good offense is based on three major things- being good, sequencing well, and playing in a friendly park. Only the first two help you to outscore your opponent who’s playing the game in the same park, and Runs Scored can’t tell the difference between a good offense and a friendly park. As it turns out, properly removing park factor noise (wRC+) is more important than capturing sequencing (Runs Scored).

Both clearly beat wOBA, as expected, because wRC+ is basically wOBA without park factor noise, and Runs Scored is basically wOBA with sequencing added. OBP beating wOBA is kind of an accident- wOBA *differential* would beat OBP *differential*- but because park factor is more prevalent in SLG than OBP, offensive wOBA is more polluted by park noise and comes out slightly worse.

And then there’s DRC+. Not only does it not know sequencing, it doesn’t even know what component events (BB, 1B, HR, etc) actually happened, and the 25% or so of park factor that it does neutralize is not enough to make up for that. It’s not a good showing for the fancy new most descriptive metric ever when it’s literally more valuable to know a team’s OBP than its DRC+ to predict its W-L record, especially when wRC+ crushes the competition at the same task.

Mashers underperform xwOBA on air balls

Using the same grouping methodology as The Statcast GB speed adjustment seems to capture about 40% of the speed effect, except using barrel% (barrels/batted balls), I got the following for air balls (FB, LD, Popup):

barrel group	FB BA-xBA	FB wOBA-xwOBA	n
high-barrel%	0.006	-0.005	22993
avg	0.006	0.010	22775
low-barrel%	-0.002	0.005	18422

These numbers get closer to the noise range (+/- 0.003), but mashers simultaneously OUTPERFORMING on BA while UNDERPERFORMING on wOBA while weak hitters do the opposite is a tough parlay to hit by chance alone because any positive BA event is a positive wOBA event as well. The obvious explanation to me, which Tango is going with too, is that mashers just get played deeper in the OF, and that that alignment difference is the major driver of what we’ve each measured.

The Statcast GB speed adjustment seems to capture about 40% of the speed effect

Statcast recently rolled out an adjustment to its ground ball xwOBA model to account for batter speed, and I set out to test how well that adjustment was doing. I used 2018 data for players with at least 100 batted balls (n=390). To get a proxy for sprint speed, I used the average difference between the speed-unadjusted xwOBA and the speed-adjusted xwOBA for ground balls. Billy Hamilton graded out fast. Welington Castillo didn’t. That’s good. Grouping the players into thirds by their speed-proxy, I got the following

speed	Actual GB wOBA	basic xwOBA	speed-adjusted xwOBA	Actual-basic	Actual- (speed-adjusted)	n
slow	0.215	0.226	0.215	-0.011	0.000	14642
avg	0.233	0.217	0.219	0.016	0.014	16481
fast	0.247	0.208	0.218	0.039	0.029	18930

The slower players seem to hit the ball better on the ground according to basic xwOBA, but they still have worse actual outcomes. We can see that the fast players outperform the slow ones by 50 points in unadjusted wOBA-xwOBA and only 29 points after the speed adjustment.

DRC+ isn’t even a hitting metric

At least not as the term is used in baseball. Hitting metrics can adjust for nothing (box score stats, AVG, OBP, etc), league and park (OPS+, wRC+, etc), or more detailed conditions (opposing pitcher and defense, umpire, color of the uniforms, proximity of Snoop Dogg, whatever). They don’t adjust for the position played. Hitting is hitting, regardless of who does it. Unless it’s not. While fooling around with the data for DRC+ really isn’t any good at predicting next year’s wOBA for team switchers and The DRC+ team-switcher claim is utter statistical malpractice some more, it looked for all the world like DRC+ had to be cheating, and it is.

To prove that, I looked at seasons with exactly 1 PA and 1 unintentional walk for the entire season, and the DRC+ for those seasons.

NAME	YEAR	TEAM	DRC+	DRC+ SD
Audry Perez	2014	Cardinals	104	20
Spencer Kieboom	2016	Nationals	96	29
John Hester	2013	Angels	93	16
Joey Gathright	2011	Red Sox	89	24
J.c. Boscan	2010	Braves	78	25
Mark Melancon	2011	Astros	15	14
George Sherrill	2010	Dodgers	4	23
Antonio Bastardo	2014	Phillies	3	22
Dan Runzler	2011	Giants	2	19
Jose Veras	2011	Pirates	1	15
Matt Reynolds	2010	Rockies	1	12
Tony Cingrani	2016	Reds	0	25
Antonio Bastardo	2017	Pirates	-1	17
Javy Guerra	2011	Dodgers	-2	31
Josh Stinson	2011	Mets	-10	11
Aaron Thompson	2011	Pirates	-12	14
Brandon League	2013	Dodgers	-13	17
J.j. Hoover	2014	Reds	-14	32
Santiago Casilla	2011	Giants	-15	12
Jason Garcia	2015	Orioles	-16	12
Chris Capuano	2016	Brewers	-17	17
Edubray Ramos	2016	Phillies	-19	15
Matt Guerrier	2011	Dodgers	-22	9
Liam Hendriks	2015	Blue Jays	-24	15
Phillippe Aumont	2015	Phillies	-28	20
Randy Choate	2015	Cardinals	-28	52
Joe Blanton	2017	Nationals	-30	12
Jacob Barnes	2017	Brewers	-31	26
Sean Burnett	2012	Nationals	-33	20
Robert Carson	2013	Mets	-43	7

That’s a pretty good spread. The top 5 are position players, the rest are pitchers. DRC+ is blatantly cheating by assigning pitchers very low DRC+ values even when their offensive performance is good and not doing the same for 1-PA position players. wOBA and wRC+ don’t do this, as evidenced by Kieboom (#5) right there with 3 pitchers with the same seasonal stat line. It’s also not using data from prior seasons because that was Kieboom’s only career PA to date, and when Livan Hernandez debuted in 1996 for one game with 1 PA and 1 single, he got a DRC+ of -14 for his efforts. It’s just cheating, period. And it doesn’t learn either. Even when Bumgarner was hitting in 2014-2017, his DRC+s were -15, 4, -17, and -19.

I also included the DRC+ SDs here just to show that they’re complete nonsense. Pitcher Mark Melancon (15 +/- 14) has one career PA. Pitcher Robert Carson (-43 +/- 7) also has one career PA. Pitcher Randy Choate (-28 +/- 52) had one PA that year and 5 a decade earlier. What in the actual fuck?

The entire DRC+ project is a complete farce at this point. The outputs are a joke*** The SD values are nonsense (table above). The pillars it stands on are complete bullshit. It’s more descriptive of the current season than park adjusted stats because it’s not anywhere near a park-adjusted stat, even though it claims to be. It’s more predictive than park-adjusted stats for next year’s team because it’s somewhat regressed, meaning it basically can’t lose, and it’s also cheating the same way descriptiveness does by keeping a bunch of park factor. Its claimed “substantial improvement over predicting wOBA for team switchers” is statistical malpractice to begin with, and now we see that the one area where it did predict significantly better than regressed wOBA, very-low-PA players, is driven by (almost) ignoring actual results for pitchers and saying they sucked at the plate no matter how well they really hit (and treating low-PA position players with the exact same stat lines as average-ish).

***Check out DRA- land where Billy Wagner is 26 percent more valuable on a per-inning basis than Mariano Rivera and almost as valuable for his career. I love Billy Wagner, but still, come on.

RIP 12/29/2018. Comment F to pay respects.

DRC+ still contains a lot of park factor

Required knowledge: DRC+ and park factors

TL;DR read the title above, the rant 3 paragraphs down, and the very bottom

DRC+ is supposed to be a fully park-adjusted metric, but from the initial article, I couldn’t understand how that could be consistent with the reported results without either an exceptional amount of overfitting or extremely good luck. Team DRC+ was reported to be more reliable than team wRC+ at describing the SAME SEASON’s team runs/PA. Since wRC+ is based off of wOBA, team wOBA basically is team scoring offense (r=0.94), and DRC+ regresses certain components of wOBA back towards the mean quite significantly (which is why DRC+ is structurally unfit for use in WAR), it made no sense to me that a metric that took away actual hits that created actual runs from teams with good BABIPs and invented hits in place of actual outs for teams with bad BABIPs could possibly correlate better to actual runs scored than a metric that used what happened on the field. It’s not quite logically impossible for that to be true, but it’s pretty damn close.

It turns out the simple explanation for how a park-adjusted significantly regressed metric beat a park-adjusted unregressed metric is the correct one. It didn’t. DRC+ keeps in a bunch of park factor and calls itself a park-adjusted metric when it’s simply not one, and not even close to one. The park factor table near the bottom of the DRC+article should have given anybody who knows anything about baseball serious pause, and of course it fits right in with DRC+’s “great descriptiveness”.

RANT

How in the hell does a park factor of 104 for Coors get published without explanation by any person or institution trying to be serious? The observed park factors (halved) the last few years, in reverse order: 114 (2018), 115, 116, 117, 120, 109, 123… You can’t throw out a number like Coors 104 like it’s nothing. If Jonathan Judge could actually justify it somehow- maybe last year we got a fantastic confluence of garbage pitchers and great situational hitting at Coors and the reverse on the road while still somehow only putting up a 114, where you could at least handwave an attempt at a justification, then he should have made that case when he was asked about it, but instead he gave an answer indicative of never having taken a serious look at it. Spitting out a 104 for Coors should have been like a tornado siren going off in his ear to do basic quality control checks on park effects for the entire model, but it evidently wasn’t, so here I am doing it instead.

/RANT

The basic questions are “how correlated is team DRC+ to home park factor?” and “how correlated should team DRC+ be to home park factor?”. The naive answer to the second question is “not correlated at all since it’s park adjusted, duh”, but it’s possible that the talented hitters skew towards hitters’ parks, which would cause a legitimate positive correlation, or that they skew towards pitchers’ parks, which would cause a legitimate negative correlation. As it turns out, over the 2003-2017 timeframe, hitting talent doesn’t skew at all, but that’s an assertion that has to be demonstrated instead of just assumed true, so let’s get to it.

We need a way to make (offensive talent, home park factor) team-season pairs that can measure both components separately without being causally correlated to each other. Seasonal team road wOBA is a basically unbiased way to measure offensive quality independent of home park factor because the opposing parks played in have to average out pretty similarly for every team in the same league (AL/NL)**. If we use that, then we need a way to make a park factor for those seasons that can’t include that year’s data, because everything else being equal, an increase in a team’s road wOBA would decrease its home park factor****, and we’re explicitly trying to avoid nonsense like that. Using the observed park factors from *surrounding years*, not the current year, to estimate the current year’s park factor solves that problem, assuming those estimates don’t suck.

** there’s a tiny bias from not playing road games in a stadium with your park factor, but correcting that by adding a hypothetical 5 road games at estimated home park factor doesn’t change conclusions)

**** some increase will be skill that will, on average, increase home wOBA as well and mostly cancel out, and some increase will be luck that won’t cancel out and would screw the analysis up

Methodology

I used all eligible team-batting-seasons, pitchers included, from 2003-2017. To estimate park factors, I used the surrounding 2 years (T-2, T-1, T+1, T+2) of observed park factors (for runs) if they were available, the surrounding 1 year (T-1, T+1) otherwise, and threw out the season if I didn’t have those. That means I threw out all 2018s as well as the first and last years in each park. I ignored other changes (moved fences, etc).

Because I have no idea what DRC+ is doing with pitcher-batters, how good its AL-NL benchmarking is, and the assumption of nearly equivalent aggregate road parks is only guaranteed to hold between same-league teams, I did the DRC+ analysis separately for AL and NL teams.

To control for changing leaguewide wOBA in the 2003-2017 time period, I used the same wOBA/LgAvGwOBA wOBA% method I used in DRC+ really isn’t any good at predicting next year’s wOBA for team switchers for wOBA and DRC+, just for AL teams and NL teams separately for the reasons above. After this step, I did analyses with and without Coors because it’s an extreme outlier. We already know with near certainty that their treatment of Coors is ~~kind of questionable~~ batshit crazy and keeps way too much park effect in DRC+, so I wanted to see how they did everywhere else.

Results

The park factor estimation worked pretty well. 2 surrounding year PF correlated to the observed PF for the year in question at r=0.54 (0.65 with Coors) and the 1 surrounding year at r=0.52 (0.61 with Coors). The 5-year FanGraphs PF, WHICH USES THE YEAR IN QUESTION, only correlates at r=0.7 (0.77 with Coors) and the 1 and 2 year park factors correlate to the Fangraphs PF at 0.87 and 0.96 respectively. This is plenty to work with given the effect sizes later.

Team road wOBA% (squared or linear) correlates to the estimated home park factor at r = -0.03, literally nothing, and with the 5 extra hypothetical games as mentioned in the footnote above, r=0.02, also literally nothing. It didn’t have to be this way, but it’s convenient that it is. Just to show that road wOBA isn’t all noise, it correlates to that season’s home wOBA% at r=0.32 (0.35 with the adjustment) even though we’re dealing with half seasons and home wOBA% contains the entire park factor. Road wOBA% correlates to home wOBA%/sqrt(estimated park factor) at r=0.56 (and wOBA%/park factor at r=0.54). That’s estimated park factor from surrounding years, not using the home and road wOBA data in question.

Home wOBA% is obviously hugely correlated to estimated park factor (r=0.46 for home wOBA%^2 vs estimated PF), but park adjusting it by correlating

(home wOBA%)^2/estimated park factor TO estimated park factor

has r= -0.00017. Completely uncorrelated to estimated PF (it’s pure luck that it’s THAT low).

So we’ve established that road wOBA really does contain a lot of information on a team’s offensive talent (that’s a legitimate naive “duh”), that it’s virtually uncorrelated to true home park factor, and that park-adjusted home wOBA% (using PF estimates from other seasons only) is also uncorrelated to true home park factor. If DRC+ is a correctly park-adjusted metric that measures offensive talent, DRC+% should also have to be virtually uncorrelated to true home park factor.

And… the correlation of DRC+% to estimated park factor is r= 0.38 for AL teams, r=0.29 for NL teams excluding Colorado, r=0.31 including Colorado. Well then. That certainly explains how it can be more descriptive than an actually park-adjusted metric.

The DRC+ team-switcher claim is utter statistical malpractice

Required knowledge: MUST HAVE READ/SKIMMED DRC+ really isn’t any good at predicting next year’s wOBA for team switchers and a non-technical knowledge of what a correlation coefficient means wouldn’t hurt.

In doing the research for the other post, it was baffling to me what BP could have been doing to come up with the claim that DRC+ was a revolutionary advance for team-switchers. It became completely obvious that there was nothing particularly meaningful there with respect to switchers and that it would take a totally absurd way of looking at the data to come to a different conclusion. With that in mind, I clicked some buttons and stumbled into figuring out what they had to be doing wrong. One would assume that any sophisticated practitioner doing a correlation where some season pairs had 600+ PA each and other season pairs had 5 PA each would weight them differently… and one would be wrong.

I decided to check 4 simple ways of weighting the correlation- unweighted, by year T PA, by year T+1 PA, and by the harmonic mean of year T PA and year T+1 PA.

Table 1. Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 400 PAs year T.

400+ PA		Harmonic	Year T PA	Year T+1 PA	unweighted	N
switch	wOBA	0.34	0.35	0.34	0.34	473
switch	DRC+	0.35	0.35	0.34	0.35	473
same	wOBA	0.55	0.53	0.55	0.51	1124
same	DRC+	0.57	0.55	0.57	0.54	1124

The way to read this chart is to compare the wOBA and DRC+ correlations for each group of hitters- switch to switch (lines 1 and 2) and same to same (lines 3 and 4). It’s obvious that wOBA should correlate much better for same than switch because it contains the entire park effect which is maintained in “same” and lost in “switch”, but DRC+ behaves the same way because DRC+ also contains a lot of park factor even though it shouldn’t

In the 400+ year T PA group, the choice of weighting method is almost completely irrelevant. DRC+ correlates marginally better across the board and it has nothing to do with switch or stay. Let’s add group 2 to the mix and see what we get.

Table 2. Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 100 PAs year T.

100+ PA		Harmonic	Year T PA	Year T+1 PA	unweighted	N
switch	wOBA	0.31	0.29	0.29	0.26	1100
switch	DRC+	0.33	0.31	0.32	0.29	1100
same	wOBA	0.51	0.47	0.50	0.44	2071
same	DRC+	0.54	0.51	0.53	0.47	2071

The values change, but DRC+’s slight correlation lead doesn’t, and again, nothing is special about switchers except that they’re overall less reliable. Some of the gaps widen by a point or two, but there’s no real sign of the impending disaster when the low-PA stuff that favors DRC+ comes in. But what a disaster there is….

Table 3. Correlation coefficients to year T+1 wOBA% by different weighting methods, all season pairs.

1+ PA		Harmonic	Year T PA	Year T+1 PA	unweighted	N
switch	wOBA	0.45	0.41	0.38	0.37	1941
switch	DRC+	0.54	0.47	0.58	0.57	1941
same	wOBA	0.62	0.58	0.53	0.52	3639
same	DRC+	0.67	0.62	0.66	0.66	3639

The two weightings (Harmonic and Year T) that minimize the weight of low-data garbage projections stay saner, and the two methods that don’t (year T+1 and unweighted) go bonkers and diverge by around what BP reports, If I had to guess, I have more pitchers in my sample for a slightly bigger effect and regressed DRC+% correlates a bit better. And to repeat yet again, the effect has nothing to do with stay/switch. It’s entirely a mirage based on flooding the sample with bunches of low-data garbage projections based on handfuls of PAs and weighting them equally to pairs of qualified seasons.

You might be thinking that that sounds crazy and wondering why I’m confident that’s what really happened. Well, as it turns out- and I didn’t realize this until after the analysis- they actually freaking told us that’s what they did. The caption for the chart is “Table 3: Reliability of Team-Switchers, Year 1 to Year 2 wOBA (2010-2018); Normal Pearson Correlations”. Normal Pearson correlations are unweighted. Mystery confirmed solved.

DRC+ really isn’t any good at predicting next year’s wOBA for team switchers

UPDATE:

The decent performance I get for DRC+ projecting low-PA players is from assigning pitchers terrible DRC+s no matter how well they hit. The rest of the post is fine, but this is all even more bullshit than I realized at the time of writing.

/UPDATE

Required knowledge: wOBA and DRC+

Part 2: The DRC+ team-switcher claim is utter statistical malpractice

TL;DR: raw DRC+ is a little better overall than projecting everybody to be league average, but actually worse than that for team-switchers. Best-regressed DRC+ has about a 2.5-point MAE improvement over “everybody hits league average” for switchers and a 1 point MAE improvement over best-regressed wOBA. Regressed DRC+ has a huge advantage projecting very-low-PA seasons and starts losing to regressed wOBA around the 300 PA mark.

Having already demonstrated and explained why DRC+ is structurally unfit for use in WAR/BWARP, the purpose of this next experiment was to test the claims here and here that DRC+ was something special when it came to projecting next year’s wOBA for team-switchers. It fails that test convincingly, but a little regression work gives a decent projection for players that *don’t* switch. Unfortunately for DRC+, that prediction is only marginally better overall than the same methodology using wOBA, and only has a real advantage at the low-PA end. Regressed wOBA rules the high-PA end.

Methodology Overview

To test their claim, and to account for leaguewide wOBA changing every year, I normalized every batter-season’s wOBA onto a 100 scale by taking (batter wOBA)/(league average wOBA for that season) * 100. I’ll call that wOBA% from now on. Normalization to wOBA% makes sense because many of the factors that influence leaguewide wOBA changes in the upcoming year, from the changing strike zone to the changing baseball itself are not something DRC+ tries to predict- or ever should try to predict. Using wOBA% instead of raw wOBA removes a good deal of nonsense noise at no cost.

Team switchers were not explicitly defined, but since the test sample must be composed of players who had PAs in consecutive seasons, I’m defining a team switcher as anybody who didn’t appear entirely for the same team in both years (e.g. half a season for team A followed by 1.5 seasons for team B is a team switcher).

I also normalized DRC+ to DRC+% similarly since it was coming out a bit under 100, but since DRC+ is on the run scale, I used 100*Sqrt(Max(DRC+,0)/100) to put it on the same scale as wOBA. Seasons from 2010-2018 were used, although 2010 was only used to project 2011. Every pair of consecutive seasons where a player had at least 1 PA was eligible.

MAEs were calculated weighted by the harmonic mean of the PAs in year T and year T+1. Best regressions were determined the same way (using all pairs of seasons, not just switchers). Since MAEs are calculated in wOBA%, I multiplied by 3.20 (i.e. a wOBA of .320) to put them back on the wOBA-points scale to report.

Tests and Results

The first thing I tried was simply using year T DRC+% as the projection for year T+1 wOBA% and benchmarking that against an “everybody 100 wOBA%” (LgAvg) projection and restricting it to pairs of qualified seasons (500+ PA in each). DRC+ had an MAE of 25.1 points of wOBA against LgAvg’s 34.3 overall, but 26.5 vs 26.1 on team switchers.

In the attempt to see if the signal could be improved, I regressed year T DRC+% with league average (100 wOBA%) PAs and the minimum weighted MAE came with adding 89 average PAs. Doing the same optimization with year T wOBA% came up with adding 332 average PAs. For reporting purposes, I broke the players up into 3 groups

0-99 PAs in year T, which is just enough to capture all pitchers (2016 Bumgarner, 97 PA) as well as a bunch of callups and fill-ins who aren’t really MLB quality.
400+ PAs in year T, which is all full-time players and primary sides of platoons, etc. That number is kind of arbitrary, but it’s a little over 50% of the average PA per position, assuming some PHing, and moving it around 25 PAs isn’t really going to affect the big picture analysis anyway
100-399 PAs in year T to cover everybody else.

This is a sample report.

Table 1. wOBA MAEs, 400+ PA in both seasons. LgAvg=100

Min 400 PA both seasons	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year T+1 wOBA%	year T wOBA%	N
all	26.0	32.3	25.1	24.5	106.6	107.5	1152
switch	28.4	26.7	26.9	25.8	103.6	105.3	303
same	25.2	34.3	24.4	24.1	107.7	108.2	849

Any PA cutoff biases the sample, but using a PA cutoff in both seasons is especially bad form because it excludes players who would have reached the cutoff in year T+1 if they hadn’t been benched for sucking. Even with artificially tight performance constraints, the regressions are virtually useless for team switchers- only 1 point of MAE improvement for wOBA and nothing for DRC+. To avoid the extra bias problem, future results will include all (1+ PA) year T+1 seasons.

Table 2. wOBA MAEs, 400+ PA in year T, any PA year T+1. LgAvg=100

Min 400 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	27.6	32.6	26.6	26.3	105.0	106.6	1597
switch	30.3	28.8	29.0	28.2	101.0	103.9	473
same	26.4	34.2	25.6	25.5	106.6	107.7	1124

The bias in Table 1 is apparent now. This population is simply worse to begin with in year T (marginal hitters were more likely to suck and get benched in year T+1 and not show up in Table 1) and dropped off more to year T+1. Back to the post topic, neither DRC+ nor wOBA are any good for switchers, wOBA is a bit ahead of DRC+, and the projection for same-team players is a clear improvement on LgAvg.

Table 3. 100-399+ PA in year T, any PA year T+1. LgAvg=100

100-399 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	36.8	36.2	34.7	34.5	98.5	97.8	1574
switch	38.5	38.3	36.3	36.5	94.3	95.2	627
same	35.7	34.9	33.6	33.2	101.4	99.5	947

The league average errors are a good bit worse and now DRC+ and wOBA are pretty useless for everything, offering at best a 2-point improvement over LgAvg. Also, the quality of the players here is clearly worse because….. better players get more PAs and make it into Table 2 instead.

Table 4. wOBA MAEs, 1-99+ PA in year T, any PA year T+1. LgAvg=100

1-99 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	110.2	102.2	62.9	92.5	84.3	69.8	2409
switch	91.4	96.7	64.3	87.7	73.3	69.8	841
same	120.3	105.2	62.2	95.1	90.2	69.7	1568

And this is interesting. Garbage hitters, giant MAEs, and regressed DRC+ winning by a mile for a change. The other interesting thing here is that the players teams keep improve *a ton* and the ones they let go keep being godawful at the plate. Somebody should look into that in more detail.

Seeing that the 100-399 PA group at least resembled MLB-quality hitters, albeit not the good ones, and that the 1-99 PA group was an abomination at the plate (it did include all the pitchers), I wondered what would happen if I cheated a little and tried to optimize on the 100+ PA group instead of everybody. That group looks like

Table 5. wOBA MAEs, 100+ PA in year T, any PA year T+1. LgAvg=100

Min 100 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	30.4	33.7	29.1	28.8	102.7	103.9	3171
switch	33.3	32.2	31.6	31.2	98.6	100.8	1100
same	28.9	34.5	27.7	27.5	104.9	105.6	2071

Again, useless for switchers, solid improvement for the ones who stayed. Based on this, I decided to reoptimize based on a LgAvg of 103 using only players with 100+ PA year T just to see what would happen.

Trying a different league average

This is starting to look down the rabbit hole of regressing to talent (more PA is a proxy for more talent as we’ve seen) instead of to pure league average, but let’s see what happens. The regression amounts came out to 243 added PA for DRC+ and 410 added PA for wOBA. Doing that came up with

Table 6. wOBA MAEs, 100+ PA in year T, any PA year T+1. LgAvg=103

Min 100 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	30.4	33.2	28.7	29.1	102.7	103.9	3171
switch	33.3	33.7	31.3	31.9	98.6	100.8	1100
same	28.9	32.9	27.2	27.6	104.9	105.6	2071

Well, that’s not an auspicious start, marginally helping the stayers (the switchers were closer to 100, so regressing towards 103 isn’t any help). Let’s see if there’s any benefit in either group individually.

Table 7. wOBA MAEs, 100-399 PA in year T, any PA year T+1. LgAvg=103

100-399 PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	36.8	38.7	34.5	35.6	98.5	97.8	1574
switch	38.5	41.9	36.7	38.0	94.3	95.2	627
same	35.7	36.6	33.0	33.9	101.4	99.5	947

Well, that was a disaster for these guys. The top line using 100 LgAvg was 36.8 / 36.2 / 34.7 / 34.5 before, and regressing them further from their talent shockingly didn’t do any favors. Was this made up for by the full-time players?

Table 8. wOBA MAEs, 400+ PA in year T, any PA year T+1. LgAvg=103

400+ PA year T	raw DRC+%	LgAvg	regd DRC+%	regd wOBA%	year t+1 wOBA%	year T wOBA%	N
all	27.6	30.8	26.1	26.3	105.0	106.6	1597
switch	30.3	29.1	28.3	28.5	101.0	103.9	473
same	26.4	31.5	25.2	25.4	106.6	107.7	1124

Not really. This is a super marginal improvement over the previous top line of 27.6 / 32.6 / 26.6 / 26.3. Only the LgAvg projection really benefits at all, and that’s not what we’re interested in. Changing league average a little and optimizing over only MLB-quality hitters doesn’t seem to really accomplish anything great for the DRC+ or wOBA regressions.

Conclusion

There’s little-to-nothing to BP’s claim that DRC+ is something special for team switchers. Raw DRC+% is worse than league average, and the MAE for best-regressed DRC+ is about 2 points better than league average overall, and that entire benefit is from the low-PA end. It projects team-switching full-time players worse than assuming they’re league average. However, for the really low-PA players, it is more accurate raw and much more accurate regressed than league average.

Likewise there’s also absolutely nothing to the claim that DRC+ is a significant improvement over wOBA for predicting year T+1 wOBA for switchers- the gap is actually *smaller* for switchers. 1.1 points of MAE for switchers and 1.5 points for stayers. The best regression of DRC+ absolutely does shine in the very-low-PA group, but it’s also not good in the full-time player category. Regressed DRC+ and regressed wOBA actually do make fairly decent, much better than league average predictions for full-time players who *stay*, for whatever an untested model fit to in-sample data is worth.

C’mon Man- Baseball Prospectus DRC+ Edition

Required knowledge: A couple of “advanced” baseball stats. If you know BABIP, wRC+, and WAR, you shouldn’t have any trouble here. If you know box score stats, you should be able to get the gist.

Baseball Prospectus recently introduced its Deserved Runs Created offensive metric that purports to isolate player contribution to PA outcomes instead of just tallying up the PA outcomes, and they’re using that number as an offensive input into their version of WAR. On top of that, they’re pushing out articles trying to retcon the 2012 Trout vs. Cabrera “debate” in favor of Cabrera and trying to give Graig Nettles 15 more wins out of thin air. They appear to be quite serious and all-in on this concept as a more accurate measure of value. It’s not.

The exact workings of the model are opaque, but there’s enough description of the basic concept and the gigantic biases are so obvious that I feel comfortable describing it in broad strokes. Instead of measuring actual PA outcomes (like OPS/wOBA/wRC+/etc) or being a competitive forecasting system (Steamer/ZIPS/PECOTA), it’s effectively just a shitty forecast based on one hitter-season of data at a time****.

It weights the more reliable (K/BB/HR) components more and the less reliable (BABIP) components less like projections do, but because it’s wearing blinders and can’t see more than one season at a time, it NEVER FUCKING LEARNS**** that some players really do have outlier BABIP skill and keeps over-regressing them year after year. This is methodologically fatal. It’s impossible to salvage a one-year-of-stats-regressed framework. It might work as a career thing, but then year X WAR would change based on year X+1 performance.

Addendum for clarity: If DRC+ regresses each season as though that’s all the information it knows, then adds those regressed seasons up to determine career value, that is *NOT* the same as correctly regressing the total career. If, for example, BABIP skill got regressed 50% each year, then DRC+ would effectively regress the final career value 50% as well (as the result of adding up 50%-regressed seasons), even though the proper regression after 8000 PAs is much, much less. This is why the entire DRC+ concept and the other similarly constructed regressed-season BP metrics are broken beyond all repair. /addendum

****The description is vague enough that it might actually use multiple years and slowly learn over a player’s career, but it definitely doesn’t understand that a career of outlier skill means that the outlier skill (likely) existed the whole time it was presenting, so the general problem of over-regressing year after year would still apply, just more to the earlier years. Trout has 7 full years and he’s still being underrated by 18, 18, and 11 points the last 3 years compared to wRC+ and 17 points over his whole career.

DRC+ loves good hitters with terrible BABIPs and particularly ones with bad BABIPs and lots of HRs. Graig Nettles and his career .245 +/- .005 BABIP / 390 HRs looks great to DRC+ (120 vs 111 wRC+, +14.7 wins at the plate), as do Mark McGwire (164 vs 157, +8.5 wins), Harmon Killebrew (150 vs 142, +16.2 wins), Ernie Banks (129 vs 118, +20.8 wins), etc. Guys who beat the hell out of the ball and run average-ish BABIPs are rated similarly to wRC+, Barry Bonds (175 vs 173), Hank Aaron (150 vs 153), Willie Mays (150 vs 154), Albert Pujols (147 vs 146), etc.

The flip side of that is that DRC+ really, really hates low-ISO/high BABIP quality hitters. It underrates Tony Gwynn (119 vs 132, -12.9 wins) because it can’t figure out that the 8-time batting champ can hit. In addition, it hates Roberto Alomar (110 vs 118, -10.4 wins) Derek Jeter (105 vs 119, -17.9 wins), Rod Carew (112 v 132, -18.7 wins), etc. This is simply absurd.

C’mon man.

Valuation

Park Adjustments

NAME

YEAR

TEAM

DRC+

DRC+ SD

Audry Perez

2014

Cardinals

104

20

Spencer Kieboom

2016

Nationals

96

29

John Hester

2013

Angels

93

16

Joey Gathright

2011

Red Sox

89

24

J.c. Boscan

2010

Braves

78

25

Mark Melancon

2011

Astros

15

14

George Sherrill

2010

Dodgers

4

23

Antonio Bastardo

2014

Phillies

3

22

Dan Runzler

2011

Giants

2

19

Jose Veras

2011

Pirates

1

15

Matt Reynolds

2010

Rockies

1

12

Tony Cingrani

2016

Reds

0

25

Antonio Bastardo

2017

Pirates

-1

17

Javy Guerra

2011

Dodgers

-2

31

Josh Stinson

2011

Mets