Hello everybody. Peabody here.
Shame I can’t earn any endorsements for the upcoming movie, because I can so do that voice. At least the original one, and when I’m not cold-ridden like I’ve been this weekend, pretty much confining myself to the room with the wood stove.
So I have looked back at the projections I released two weeks ago, and I did find one major mistake. Yes,there was much criticism of my methods being extremely conservative and not deviating very far from average – criticism which I didn’t necessarily take at full value, because, well, it is generally true. The methods, and the decisions leading to those methods – things like forcing the league totals to conform to last season’s league totals – force the system into a conservative mode. My default assumption is that there was nothing out of the ordinary.
But when I ran followup tests, like the average error of forecast components from the last few years – I found that I was going seriously astray. The process was something like this:
a) run analysis of the player’s performance over the last three years to set a baseline of expected performance. That is essentially just a weighted average of the last 3 seasons, with weights that vary by stat – some are more sensitive to just the most recent season, some to the entire three-year average, and some have little predictability at all.
b) compare that baseline performance with the baselines of players from baseball history. try to see if there is a consistent deviation from those baselines that can be applied to the current player.
Now, the weights in step A could be something like .523, .233, .150, which are for the hitter’s strikeout component. You’ll notice that they only add up to .896. That difference between the sum opf the components and 1 is a measure, a recognition, of regression to the mean – partiuclarly since my components are zero-based to league average. For a highly predictive statistic like batter K, the sum is close to 1. For hitter batting average, the sum is only .620; for pitcher delta-runs, it is barely 0.2.
The second step goes something like Baseline+delta*x, where delta is the difference between comparison players and their baselines, and x is an indicator of how useful those adjustments are. They go as high as 1, for speed and power, and are pretty near zero for things like those pitcher delta-runs.
The trouble is that I calculated the x component in a way that repeated the regression to the mean, essentially (baseline+delta)*x. The RTM was being double counted.
For an average player, the difference was essentially meaningless. But the more extreme they were, in any facet that I measured, then the bigger the effect. So Mike Trout, above average across the board, went from
BA OBA SLG EQA EQR WARP cPOW cSPD cSO cBB cBA Mike Trout 0.302 0.386 0.510 .316 110 7.3 11 4 0 5 12 0.306 0.406 0.530 .332 123 8.9 12 7 -1 8 15
(these are from the ‘all hitters’ section, straight from the computer, without regressing to league norms; the numbers on the projection pages will be a little lower).
Speed was dramatically affected, in part because the most extreme players are so much farther from the average. Trout went from 26 SB to 40; Billy Hamilton went from 43 to 72. His power dropped from a -9 component before to -11 now (sometimes the R-T-M works in your favor). Miguel Cabrera went from 30 HR to 36.
Fortunately, the pitchers weren’t similarly affected; the double-counting coding error didn’t happen in that directory. I did take advantage of my analysis to updte the weights, which made for some differences. And the overdone regression to means had infected the fielding analysis as well, so that teams with good fielding weren’t getting enough credit for it, which did feed back on the pitcher ratings.
The effect on teams was dependent on having extreme players. Those that did, benefitted by perhaps a win, maybe two. It did let a little more spread into the standings, with peak wins inching up from 91 to 93 and min wins dropping from 67 to to 66.
So a quick look at the changes on the team level since 1/24, not all of which come from my code changes:
AL East: was TB 90 Bos 86 NY 85 Tor 78 Bal 77 now 90 89 86 79 81
The Orioles’ gain is mostly from me jumping the gun and sending A.J. Burnett their way, as he represents a big upgrade over their assorted fifth starter contenders. There was also a component for opponent quality that wasn’t kicking in – while the teams in the AL East were being judged harshly because of their ferocious schedule (playing other AL East teams), they weren’t receiving the compensating break – that their record isn’t an unbiased assessment of quality when they are NOT playing in the AL East.
AL Central: was DET 91 Cle 85 CWS 79 KC 77 Min 72 Det 89 82 78 77 71
And that quality change I just spoke kicks the AL Central in the teeth. Kansas City does well to stand pat with their 77-win forecast – the addition of Bruce Chen helps a little – while everyone else drops 1-3 games.
AL West: was OAK 88 TEX 87 LAA 84 Sea 83 Hou 70 now 91 85 86 81 67
Fixing the RTMs hurt Houston. Seattle was especially hurt by the changes in fielding, as they will have a lot of positional uncertainty – even the presumptive addition of an overrated Nelson Cruz doesn’t save them from a drop. Hosuton was also hurt by that, but Oakland did just fine. Trout alone benefitted by 15 runs from fixing the RTM error, and the Angels gained two games.
NL East: was WAS 87 ATL 85 NY 78 Mia 75 PHI 72 now 88 84 77 73 73
The Nationals gain a game on the Braves, based on the changes I made, because I don’t believe there’s been any player movement outside the bullpens.
NL Central: was STL 90 PIT 83 CIN 80 MIL 77 CHC 67 now 93 83 78 80 66
The Brewers added Matt Garza and Francisco Rodriguez, both pretty nice pickups, and Mark Reynolds makes their first base situation a little less desperate…but I am surprised at how they’ve switched places with the Reds. I promise, I’m not making any deliberate moves to hold the Reds back, but they keep on slipping.
NL West: was LA 88 SF 85 SD 83 Ari 78 Col 71 89 85 81 78 72
Not much change here, with the most notable one being the Padres’ loss of Luebke for the season. I don’t see Arroyo doing much but adding depth – he’s no better than the mostly Randall Delgado innings he replaces – and ditto for Maholm and the Dodgers.
While everything on this site is free, a donation through Paypal to help offset costs would be greatly appreciated. -Clay
If you are trying to reach me, drop me an email. Same address as the webpage, but replace ".com" with "@gmail.com".
- January 2019
- March 2018
- February 2018
- January 2018
- August 2017
- June 2017
- March 2017
- January 2017
- September 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- September 2015
- April 2015
- March 2015
- January 2015
- December 2014
- November 2014
- October 2014
- April 2014
- February 2014
- January 2014
- October 2013
- April 2013
- March 2013
- February 2013
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- August 2011
- July 2011
- June 2011
- May 2011