Hello everybody. Peabody here.

Shame I can’t earn any endorsements for the upcoming movie, because I can so do that voice. At least the original one, and when I’m not cold-ridden like I’ve been this weekend, pretty much confining myself to the room with the wood stove.

So I have looked back at the projections I released two weeks ago, and I did find one major mistake. Yes,there was much criticism of my methods being extremely conservative and not deviating very far from average – criticism which I didn’t necessarily take at full value, because, well, it is generally true. The methods, and the decisions leading to those methods – things like forcing the league totals to conform to last season’s league totals – force the system into a conservative mode. My default assumption is that there was nothing out of the ordinary.

But when I ran followup tests, like the average error of forecast components from the last few years – I found that I was going seriously astray. The process was something like this:

a) run analysis of the player’s performance over the last three years to set a baseline of expected performance. That is essentially just a weighted average of the last 3 seasons, with weights that vary by stat – some are more sensitive to just the most recent season, some to the entire three-year average, and some have little predictability at all.
b) compare that baseline performance with the baselines of players from baseball history. try to see if there is a consistent deviation from those baselines that can be applied to the current player.

Now, the weights in step A could be something like .523, .233, .150, which are for the hitter’s strikeout component. You’ll notice that they only add up to .896. That difference between the sum opf the components and 1 is a measure, a recognition, of regression to the mean – partiuclarly since my components are zero-based to league average. For a highly predictive statistic like batter K, the sum is close to 1. For hitter batting average, the sum is only .620; for pitcher delta-runs, it is barely 0.2.

The second step goes something like Baseline+delta*x, where delta is the difference between comparison players and their baselines, and x is an indicator of how useful those adjustments are. They go as high as 1, for speed and power, and are pretty near zero for things like those pitcher delta-runs.

The trouble is that I calculated the x component in a way that repeated the regression to the mean, essentially (baseline+delta)*x. The RTM was being double counted.

For an average player, the difference was essentially meaningless. But the more extreme they were, in any facet that I measured, then the bigger the effect. So Mike Trout, above average across the board, went from

                  BA   OBA  SLG   EQA EQR  WARP cPOW cSPD cSO cBB cBA  
Mike Trout     0.302 0.386 0.510 .316 110  7.3   11    4    0   5  12  
               0.306 0.406 0.530 .332 123  8.9   12    7   -1   8  15 

(these are from the ‘all hitters’ section, straight from the computer, without regressing to league norms; the numbers on the projection pages will be a little lower).

Speed was dramatically affected, in part because the most extreme players are so much farther from the average. Trout went from 26 SB to 40; Billy Hamilton went from 43 to 72. His power dropped from a -9 component before to -11 now (sometimes the R-T-M works in your favor). Miguel Cabrera went from 30 HR to 36.

Fortunately, the pitchers weren’t similarly affected; the double-counting coding error didn’t happen in that directory. I did take advantage of my analysis to updte the weights, which made for some differences. And the overdone regression to means had infected the fielding analysis as well, so that teams with good fielding weren’t getting enough credit for it, which did feed back on the pitcher ratings.

The effect on teams was dependent on having extreme players. Those that did, benefitted by perhaps a win, maybe two. It did let a little more spread into the standings, with peak wins inching up from 91 to 93 and min wins dropping from 67 to to 66.

So a quick look at the changes on the team level since 1/24, not all of which come from my code changes:

AL East: was TB 90 Bos 86 NY 85 Tor 78 Bal 77
         now    90     89    86     79     81

The Orioles’ gain is mostly from me jumping the gun and sending A.J. Burnett their way, as he represents a big upgrade over their assorted fifth starter contenders. There was also a component for opponent quality that wasn’t kicking in – while the teams in the AL East were being judged harshly because of their ferocious schedule (playing other AL East teams), they weren’t receiving the compensating break – that their record isn’t an unbiased assessment of quality when they are NOT playing in the AL East.

AL Central: was DET 91 Cle 85 CWS 79 KC 77 Min 72
                Det 89     82     78    77     71

And that quality change I just spoke kicks the AL Central in the teeth. Kansas City does well to stand pat with their 77-win forecast – the addition of Bruce Chen helps a little – while everyone else drops 1-3 games.

AL West: was OAK 88 TEX 87 LAA 84 Sea 83 Hou 70
         now     91     85     86     81     67

Fixing the RTMs hurt Houston. Seattle was especially hurt by the changes in fielding, as they will have a lot of positional uncertainty – even the presumptive addition of an overrated Nelson Cruz doesn’t save them from a drop. Hosuton was also hurt by that, but Oakland did just fine. Trout alone benefitted by 15 runs from fixing the RTM error, and the Angels gained two games.

NL East: was WAS 87 ATL 85 NY 78 Mia 75 PHI 72
         now     88     84    77     73     73

The Nationals gain a game on the Braves, based on the changes I made, because I don’t believe there’s been any player movement outside the bullpens.

NL Central: was STL 90 PIT 83 CIN 80 MIL 77 CHC 67
            now     93     83     78     80     66

The Brewers added Matt Garza and Francisco Rodriguez, both pretty nice pickups, and Mark Reynolds makes their first base situation a little less desperate…but I am surprised at how they’ve switched places with the Reds. I promise, I’m not making any deliberate moves to hold the Reds back, but they keep on slipping.

NL West: was LA 88 SF 85 SD 83 Ari 78 Col 71
                89    85    81     78     72

Not much change here, with the most notable one being the Padres’ loss of Luebke for the season. I don’t see Arroyo doing much but adding depth – he’s no better than the mostly Randall Delgado innings he replaces – and ditto for Maholm and the Dodgers.


Comments are closed.

Set your Twitter account name in your settings to use the TwitterBar Section.