Required knowledge: MUST HAVE READ/SKIMMED DRC+ really isn’t any good at predicting next year’s wOBA for team switchers and a non-technical knowledge of what a correlation coefficient means wouldn’t hurt.
In doing the research for the other post, it was baffling to me what BP could have been doing to come up with the claim that DRC+ was a revolutionary advance for team-switchers. It became completely obvious that there was nothing particularly meaningful there with respect to switchers and that it would take a totally absurd way of looking at the data to come to a different conclusion. With that in mind, I clicked some buttons and stumbled into figuring out what they had to be doing wrong. One would assume that any sophisticated practitioner doing a correlation where some season pairs had 600+ PA each and other season pairs had 5 PA each would weight them differently… and one would be wrong.
I decided to check 4 simple ways of weighting the correlation- unweighted, by year T PA, by year T+1 PA, and by the harmonic mean of year T PA and year T+1 PA.
Table 1. Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 400 PAs year T.
|400+ PA||Harmonic||Year T PA||Year T+1 PA||unweighted||N|
The way to read this chart is to compare the wOBA and DRC+ correlations for each group of hitters- switch to switch (lines 1 and 2) and same to same (lines 3 and 4). It’s obvious that wOBA should correlate much better for same than switch because it contains the entire park effect which is maintained in “same” and lost in “switch”, but DRC+ behaves the same way because DRC+ also contains a lot of park factor even though it shouldn’t
In the 400+ year T PA group, the choice of weighting method is almost completely irrelevant. DRC+ correlates marginally better across the board and it has nothing to do with switch or stay. Let’s add group 2 to the mix and see what we get.
Table 2. Correlation coefficients to year T+1 wOBA% by different weighting methods, minimum 100 PAs year T.
|100+ PA||Harmonic||Year T PA||Year T+1 PA||unweighted||N|
The values change, but DRC+’s slight correlation lead doesn’t, and again, nothing is special about switchers except that they’re overall less reliable. Some of the gaps widen by a point or two, but there’s no real sign of the impending disaster when the low-PA stuff that favors DRC+ comes in. But what a disaster there is….
Table 3. Correlation coefficients to year T+1 wOBA% by different weighting methods, all season pairs.
|1+ PA||Harmonic||Year T PA||Year T+1 PA||unweighted||N|
The two weightings (Harmonic and Year T) that minimize the weight of low-data garbage projections stay saner, and the two methods that don’t (year T+1 and unweighted) go bonkers and diverge by around what BP reports, If I had to guess, I have more pitchers in my sample for a slightly bigger effect and regressed DRC+% correlates a bit better. And to repeat yet again, the effect has nothing to do with stay/switch. It’s entirely a mirage based on flooding the sample with bunches of low-data garbage projections based on handfuls of PAs and weighting them equally to pairs of qualified seasons.
You might be thinking that that sounds crazy and wondering why I’m confident that’s what really happened. Well, as it turns out- and I didn’t realize this until after the analysis- they actually freaking told us that’s what they did. The caption for the chart is “Table 3: Reliability of Team-Switchers, Year 1 to Year 2 wOBA (2010-2018); Normal Pearson Correlations”. Normal Pearson correlations are unweighted. Mystery confirmed solved.