Updated projections, with a new pitching routine
I know I haven’t written anything for a week, but I’ve been hard at work. For the last week I’ve been working on making improvements to the forecast algorithm, particularly the pitching side. Through December and January, I was able to incorporate the component scores into the hitter forecasts, and produce an improvement over the whole stat-line approach I had been using. I’ve been trying to do the same thing for pitchers, and just this morning cracked the ‘prior performance’ barrier. While I’m still working on the improvements, I felt good enough about them to incorporate them into new model run. While the changes were dramatic for some pitchers, the effects on teams wasn’t so large – the new method does not shake up the standings. But new standings, new depth charts, and new projections are on-line.
Like the old system, the projection is based on a Marcel-like baseline. Where it differs from Marcel is that the different statistics have different weighted averages, and I use the translated data throughout the process. Strikeouts are very heavily weighted towards the most recent season – roughly a 5-2-1, rounding off, with a small (~15%) regression to mean component. Walks and groundball rates are also highly slanted, though not as much as Ks. At the other extreme, hit rates have essentially no weight for seasons (1.2 – 1.1 – 1), and an 85% regression to mean, which is why stats like FIP work. Once the baselines are calculated for everybody, I go through a similar-player search, and then see how those similar players deviated from their baselines in the following year(s), and apply those deviations to the players. Once all this is done, I run the player through the translation routine backwards to get his stats back into an expected-2012 performance baseline.
I’m testing the new projection system against the set of all pitchers, who had 50 major league innings in 2011, who pitched for only one team in 2011, and who had a major league appearance in 2010. My note says that is 437 pitchers. I’m only looking at five top-level stats for judgment – hits, walks, strikeouts, homeruns, and runs allowed. The projection is normalized to the actual innings pitched in 2011, and I just look resulting errors tabulated.
Here’s the root-mean-square error you get from just using the player’s 2008-10 (major league) stats as your 2011 projection:
Hits 14.03 HR 4.09 BB 8.74 SO 12.72 R 11.48 Sum= 51.06
Same thing, but using his translated stats for 2008-10 as the projection:
Hits 13.76 HR 3.57 BB 8.01 SO 14.26 R 10.66 Sum= 50.26
Lower is better, so this gives us the not terribly surprising result that using reasonably adjusted minor league data in addition to major league data is better than major league data alone. Incidentally, if I use the luck-free runs allowed instead of actual runs – that would be calculated runs, using a normal number of H/BIP and HR/FB – the run error would drop almost a run, to 9.85.
Here’s the results of the program I’d been using to use projections for the past two months:
Hits 11.66 HR 3.27 BB 7.62 SO 13.03 R 9.88 Sum= 45.46
And here’s the results I’m getting from the new version, as of 11:00 PM Sunday night:
Hits 11.48 HR 3.33 BB 7.59 SO 13.16 R 9.47 Sum= 45.03
I’m more than a little annoyed at seeing the strikeout numbers trend backwards; on the other hand, the improvements everywhere else suggest that I’ve got a blind spot – a hole in my swing, as it were – probably a calculation error that should lead to a nifty improvement once I track it down.
In case you were wondering about over-fitting, I am also checking the routines against 2009 and 2010 pitchers, who are not part of the test set. The improvements there are about 3/4 size of the 2011, which suggests some mild overfitting, but not enough for me to be worked up over. At least not yet.
8 Responses to Updated projections, with a new pitching routine
While everything on this site is free, a donation through Paypal to help offset costs would be greatly appreciated. -Clay
If you are trying to reach me, drop me an email. Same address as the webpage, but replace ".com" with "@gmail.com".
Archives
- January 2023
- January 2022
- September 2021
- April 2021
- February 2021
- December 2020
- February 2020
- November 2019
- January 2019
- March 2018
- February 2018
- January 2018
- August 2017
- June 2017
- March 2017
- January 2017
- September 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- September 2015
- April 2015
- March 2015
- January 2015
- December 2014
- November 2014
- October 2014
- April 2014
- February 2014
- January 2014
- October 2013
- April 2013
- March 2013
- February 2013
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- August 2011
- July 2011
- June 2011
- May 2011
Do you adjust for changes in role (SP to RP and vice versa)?
Yes, but not as elaborately as possible. The driving routine simply makes a forecast for a pitcher based on recent usage and what his comparables did; for someone like Bard or Sale, who has been a pure reliever, that projection may very well be for something like 50 games, and 0 starts.
When I put those numbers into a spreadsheet – which is where I set player positions and playing time for the majors – then there will be adjustments made to pitchers whose role is different from their projection. Being a starter reduces Ks and walks, increases H (slightly), and increases R. R will go up by about a half a run a game, but will vary a little from one pitcher to another. If the machine projections were for a starter, then no changes are needed; if he was projected as a reliever, the full change is taken; if was forecast at half-and-half, then the changes will be added at half the full value.
It is an area I’d eventually like to make more rigorous.
[…] Davenport Projections: Rangers Still Awesome February 13, 2012Clay Davenport updated his new forecast system today, with a lengthy explanation I had to read two or three times just to […]
FYI — I think you have a Juan Nicasio duplicate:
http://www.claydavenport.com/cgi-bin/playersearchminor.sh?search_name=nicasio
Fixed. He’s one of those players who came up under a different name, and I had both names floating around the database.
Cliff Lee’s WHIP of 1.00 seems a bit optimistic. He’s only had a WHIP that low once in his career.
Thanks for doing these, by the way, and for making them public.
Agreed, with two comments:
1) I found a bug in the code, converting TBF into IP, after known events like H/BB/SO are accounted for. It was using the default (“translated”) value instead of the 2011 value, which I am using as the forecast for 2012. That changes Lee’s IP from 224 to 217, and raises WHIP from 1.00 to 1.03.
2) Using 2011 as the guide means a lower offense level than at any other point in Lee’s career, and a league WHIP closer to 1.3 rather than 1.4. His WHIPs since mid-2009, *normalized to a league average of 1.31*, are 1.073 (Phi 2009), .919 (Sea 2010), 1.029 (Tex 2010), and 1.027 (last year)…I certainly won’t feel bad about a 1.03 projection.
Hey Clay,
Great stuff. We’d love to include these in our Consensus Projections. Give me a shout at the email provided if you have any concerns. We just launched with an initial group of sources yesterday (click on “edit sources”).
Cheers,
Dave