I had the bright idea to look at the familiarity vs. fatigue TTOP debate, which has MGL on the familiarity side and Pizza Cutter on the fatigue side, by measuring performance based on the number of pitches the batter had seen previously and the number of pitches that the pitcher had thrown to other players in between the PAs in question. After all, a fatigue effect on the TTOP shouldn’t be from “fatigue”, but “relative change in fatigue”, and that seemed like a cleaner line of inquiry than just total pitch count. Not a perfect one, but one that should pick up a signal if it’s there. Then I realized MGL had already done the first part of that experiment, which I’d somehow completely forgotten even though I’d read that article and the followup around the time they came out. Oh well. It never hurts to redo the occasional analysis to make sure conclusions still hold true.
I found a baseline 15 point PA1-PA2 increase as well as another 15 point PA2-PA3 increase. I didn’t bother looking at PA4+ because the samples were tiny and usage is clearly changing. In news that should be surprising to absolutely nobody reading this, PAs given to starters are on the decline overall and the number of PA4+ is absolutely imploding lately.
Season | Total PAs | 1st TTO | 2nd | 3rd | 4th | 5th |
2008 | 116960 | 42614 | 40249 | 30731 | 3359 | 7 |
2009 | 116963 | 42628 | 40186 | 30736 | 3406 | 7 |
2010 | 119130 | 42621 | 40457 | 32058 | 3990 | 4 |
2011 | 119462 | 42588 | 40458 | 32333 | 4080 | 3 |
2012 | 116637 | 42506 | 40336 | 30741 | 3050 | 4 |
2013 | 116872 | 42570 | 40422 | 31026 | 2851 | 3 |
2014 | 117325 | 42612 | 40618 | 31235 | 2856 | 4 |
2015 | 114797 | 42658 | 40245 | 29580 | 2314 | 0 |
2016 | 112480 | 42461 | 40128 | 28193 | 1698 | 0 |
2017 | 110195 | 42478 | 39912 | 26476 | 1329 | 0 |
2018 | 106051 | 42146 | 38797 | 24057 | 1051 | 0 |
Looking specifically at PA2 based on the number of pitches seen in PA1, I found a more muted effect than MGL did using 2008-2018 data with pitcher-batters and IBB/sac-bunt PAs removed. My data set consisted of (game,starter,batter,pa1,pa2,pa3) rows where the batter had to face the starter at least twice, the batter wasn’t the pitcher, and any ibb/sac bunt PA in the first three trips disqualified the row (pitch counts do include pitches to non-qualified rows where relevant). For a first pass, that seemed less reliant on individual batter-pitcher projections than allowing each set of PAs to be biased by crap hitters sac-bunting and good hitters getting IBBd would have been.
Pitches in PA 1 | wOBA in PA 2 | Expected** | n |
1 | 0.338 | 0.336 | 39832 |
2 | 0.341 | 0.335 | 69761 |
3 | 0.336 | 0.335 | 79342 |
4 | 0.334 | 0.335 | 82847 |
5 | 0.339 | 0.337 | 74786 |
6 | 0.347 | 0.338 | 51374 |
7+ | 0.349 | 0.337 | 36713 |
MGL found a 15 point bonus for seeing 5+ pitches the first time up (on top of the baseline 10 he found), but I only get about an 11 point bonus on 6+ pitches and 3 points of that are from increased batter/worse pitcher quality (“Expected” is just a batter/pitcher quality measure, not an actual 2nd TTO prediction). The SD of each bucket is on the order of .002, so it’s extremely likely that this effect is real, and also likely that it’s legitimately smaller than it was in MGL’s dataset, assuming I’m using a similar enough sampling/exclusion method, which I think I am. It’s not clear to me that that has to be an actual familiarity effect, because I would naively expect to see more of a monotonic increase throughout the number of pitches seen instead of the J-curve, but the buckets have just enough noise that the J-curve might simply be an artifact anyway, and short PAs are an odd animal in their own right as we’ll see later.
Doing the new part of the analysis, looking at the wOBA difference in PA2-PA1 based on the number of intervening pitches to other batters, I wasn’t sure I was going to find much fatigue evidence early in the game, but as it turns out, the relationship is clear and huge.
intervening pitches | wOBA PA2-PA1 | vs base .015 TTOP | n |
<=20 | -0.021 | -0.036 | 9476 |
21 | -0.005 | -0.020 | 5983 |
22 | -0.005 | -0.020 | 8652 |
23 | 0.004 | -0.011 | 11945 |
24 | 0.000 | -0.015 | 15683 |
25 | 0.004 | -0.011 | 19592 |
26 | 0.001 | -0.014 | 23057 |
27 | 0.005 | -0.010 | 26504 |
28 | 0.009 | -0.006 | 29690 |
29 | 0.015 | 0.000 | 31453 |
30 | 0.021 | 0.006 | 32356 |
31 | 0.014 | -0.001 | 32250 |
32 | 0.020 | 0.005 | 30723 |
33 | 0.018 | 0.003 | 28390 |
34 | 0.027 | 0.012 | 25745 |
35 | 0.028 | 0.013 | 22407 |
36 | 0.023 | 0.008 | 18860 |
37 | 0.030 | 0.015 | 15429 |
38 | 0.025 | 0.010 | 12420 |
39 | 0.012 | -0.003 | 9558 |
40 | 0.045 | 0.030 | 7362 |
41-42 | 0.032 | 0.017 | 9241 |
43+ | 0.027 | 0.012 | 7879 |
That’s a monster effect, 2 points of TTOP wOBA per intervening pitch with an unmistakable trend. Jackpot. Hareeb’s a genius. That’s big enough that it should result in actionable game situations all the time. Let’s look at it in terms of actual 2nd time wOBAs (quality-adjusted).
intervening pitches | PA2 wOBA (adj) |
<=20 | 0.339 |
21 | 0.346 |
22 | 0.343 |
23 | 0.344 |
24 | 0.340 |
25 | 0.341 |
26 | 0.339 |
27 | 0.339 |
28 | 0.337 |
29 | 0.340 |
30 | 0.341 |
31 | 0.338 |
32 | 0.347 |
33 | 0.336 |
34 | 0.345 |
35 | 0.344 |
36 | 0.336 |
37 | 0.340 |
38 | 0.328 |
39 | 0.335 |
40 | 0.340 |
41-42 | 0.338 |
43+ | 0.344 |
Wait what??!?!? Those look almost the same everywhere. If you look closely, the higher-pitch-count PA2 wOBAs even average out to be a tad (4-5 points) *lower* than the low-pitch-count ones (and the same for PA1-PA3, though that needs a closer look). If I didn’t screw anything up, that can only mean..
intervening pitches | PA1 wOBA (adj) |
<=20 | 0.361 |
21 | 0.351 |
22 | 0.348 |
23 | 0.339 |
24 | 0.340 |
25 | 0.336 |
26 | 0.338 |
27 | 0.335 |
28 | 0.327 |
29 | 0.325 |
30 | 0.320 |
31 | 0.325 |
32 | 0.326 |
33 | 0.319 |
34 | 0.318 |
35 | 0.316 |
36 | 0.312 |
37 | 0.311 |
38 | 0.303 |
39 | 0.323 |
40 | 0.295 |
41-42 | 0.306 |
43+ | 0.318 |
Yup. The number of intervening pitches TO OTHER BATTERS between somebody’s first and second PA has a monster “effect on” the PA1 wOBA. I started hand-checking more rows of pitch counts and PA results, you name it. I couldn’t believe this was possibly real. I asked one of my friends to verify that for me, and he did, and I mentioned the “effect” to Tango and he also observed the same pattern. This is actually real. It also works the same way between PA2 and PA3. I couldn’t keep looking at other TTOP stuff with this staring me in the face, so the rest of this post is going down this rabbit hole showing my path to figuring out what was going on. If you want to stop here and try to work it out for yourself, or just think about it for awhile before reading on, I thought it was an interesting puzzle.
It’s conventional sabermetric wisdom that the box-score-level outcome of one PA doesn’t impart giant predictive effects, but let’s make sure that still holds up.
Reached base safely in PA1 | PA2 wOBA (adj) | Batter quality | Pitcher quality |
Yes | 0.348 | 0.338 | 0.339 |
No | 0.336 | 0.334 | 0.336 |
That’s a 12 point effect, but 7 of it is immediately explained by talent differences, and given the plethora of other factors I didn’t control for, all of which will also skew hitter-friendly like the batter and pitcher quality did, there’s just nothing of any significance here. Maybe the effect is shorter-term than that?
Reached base safely in PA1 | Next batter wOBA (adj) | Next batter quality | Pitcher quality |
Yes | 0.330 | 0.337 | 0.339 |
No | 0.323 | 0.335 | 0.336 |
A 7 point effect where 5 is immediately explained by talent. Also nothing here. Maybe there’s some effect on intervening pitch count somehow?
Reached base safely in PA1 | Average intervening pitches | intervening wOBA (adj) |
Yes | 30.58 | 0.3282 |
No | 30.85 | 0.3276 |
Barely, and the intervening batters don’t even hit quite as well as expected given that we know the average pitcher is 3 points worse in the Yes group. Alrighty then. There’s a big “effect” from intervening pitch count on PA1 wOBA, but PA1 wOBA has minimal to no effect on intervening pitch count, intervening wOBA, PA2 wOBA, or the very next hitter’s wOBA. That’s… something.
In another curious note to this effect,
intervening pitches | intervening wOBA (adj) |
<=20 | 0.381 |
21 | 0.373 |
22 | 0.363 |
23 | 0.358 |
24 | 0.351 |
25 | 0.344 |
26 | 0.343 |
27 | 0.335 |
28 | 0.333 |
29 | 0.328 |
30 | 0.324 |
31 | 0.322 |
32 | 0.319 |
33 | 0.316 |
34 | 0.316 |
35 | 0.312 |
36 | 0.310 |
37 | 0.310 |
38 | 0.307 |
39 | 0.311 |
40 | 0.308 |
41-42 | 0.309 |
43+ | 0.311 |
Another monster correlation, but that one has a much simpler explanation: short PAs show better results for hitters
Pitches in PA | wOBA (adj) | n |
1 | 0.401 | 133230 |
2 | 0.383 | 195614 |
3 | 0.317 | 215141 |
4 | 0.293 | 220169 |
5 | 0.313 | 198238 |
6 | 0.328 | 133841 |
7 | 0.347 | 57396 |
8+ | 0.369 | 37135 |
Throw a bunch of shorter PAs together and you get the higher aggregate wOBA seen in the table right above this one. It seems like the PA length effect has to be a key. Maybe there’s a difference in the next batter’s pitch distribution depending on PA1?
Pitches in PA | Fraction of PA after reached base | Fraction of PA after out | wOBA after reached base | wOBA after out | OBP after reached base | OBP after out |
1 | 0.109 | 0.089 | 0.394 | 0.402 | 0.362 | 0.359 |
2 | 0.164 | 0.158 | 0.375 | 0.376 | 0.348 | 0.343 |
3 | 0.183 | 0.182 | 0.308 | 0.303 | 0.284 | 0.278 |
4 | 0.186 | 0.191 | 0.289 | 0.276 | 0.299 | 0.281 |
5 | 0.165 | 0.174 | 0.311 | 0.301 | 0.339 | 0.323 |
6 | 0.112 | 0.120 | 0.323 | 0.32 | 0.367 | 0.360 |
7 | 0.049 | 0.052 | 0.346 | 0.339 | 0.393 | 0.386 |
8+ | 0.032 | 0.034 | 0.356 | 0.36 | 0.401 | 0.405 |
Now we’re cooking with gas. That’s a huge likelihood ratio difference for 1-pitch PAs, and using our PA1 OBP of about .324, we’d expect to see a PA1 OBP of .370 given a 1-pitch PA followup, which is exactly what we get, and the longer PAs are more weighted to previous outs because of the odds ratio favoring outs after we get to 4 pitches.
Next PA pitches | This PA1 OBP | This PA1 wOBA |
1 | 0.370 | 0.373 |
2 | 0.333 | 0.332 |
3 | 0.326 | 0.325 |
4 | 0.319 | 0.318 |
5 | 0.313 | 0.313 |
6 | 0.311 | 0.313 |
7 | 0.314 | 0.310 |
8 | 0.313 | 0.309 |
It seems like this should be a big cause of the observed effect. I used the 2nd/6th and 3rd/7th columns from two tables up to create a process that would “play through” the next 8 PAs starting after an out or a successful PA, deciding on the number of pitches and then whether it was an out or not based on the average values. Then I calculated the expected OBP for PA1 based on the likelihood ratios of each number of total pitches to happen (the same way I got .370 from the odds ratio for a 1-pitch followup PA).
As it turns out, that effect alone can reproduce the shape and a little over half the spread
intervening pitches | PA1 OBP (adj) | model PA1 OBP |
<=20 | 0.366 | 0.340 |
21 | 0.351 | 0.336 |
22 | 0.349 | 0.329 |
23 | 0.339 | 0.338 |
24 | 0.343 | 0.332 |
25 | 0.336 | 0.328 |
26 | 0.335 | 0.327 |
27 | 0.335 | 0.328 |
28 | 0.328 | 0.328 |
29 | 0.325 | 0.326 |
30 | 0.320 | 0.326 |
31 | 0.324 | 0.323 |
32 | 0.324 | 0.323 |
33 | 0.318 | 0.321 |
34 | 0.318 | 0.324 |
35 | 0.317 | 0.323 |
36 | 0.312 | 0.317 |
37 | 0.313 | 0.318 |
38 | 0.307 | 0.320 |
39 | 0.320 | 0.310 |
40 | 0.300 | 0.317 |
41-42 | 0.308 | 0.309 |
43+ | 0.320 | 0.317 |
and that simple model is deficient at a number of things (correlations longer than 1 pa, different batters, base-out states, etc). I don’t know everything that’s causing the effect, but I have a good chunk of it, and that reverse pitch count selection bias isn’t something I’ve ever seen mentioned before. This is also a caution to any kind of analysis involving pitch counts to be very careful to avoid walking into this effect.
Perhaps it’s because I’ve been up before 4AM and needing caffeine, but I read through this twice, and am not sure what the conclusions you have made with your analysis, could you please provide a summary of what this all means?
LikeLike
That’s fair. I probably should have had a cleaner results section instead of sprinkling findings in throughout. Let’s make a list of what I found:
1. A 15-point PA1-PA2 TTO penalty and an additional 15-point PA2-PA3 TTO penalty
2. An 8-point “familiarity effect” bonus for PA2 if a player saw 6+ pitches in PA1
3. No evidence that pitch count (as measured by pitches to the other batters) plays a role in the TTOP. Pitchers with higher pitch counts actually peformed marginally better (adjusted for quality).
4. Shorter PAs favor batters, and over a given number of PAs, a lower pitch count correlates to a higher wOBA allowed because of this
5. The distribution of pitches in the next PA is quite dependent on the out/safe result of the current PA. 1-pitch PAs (sac bunts excluded) in particular are 20% more common after the previous batter reaches than after he makes an out.
6. The effect in 5 can propagate the whole way through the order- a low pitch count to the next 8 batters selects for a successful result in the current PA, which makes any analysis using pitch counts have to be very careful to avoid this selection bias.
LikeLike