I know I haven’t written anything for a week, but I’ve been hard at work. For the last week I’ve been working on making improvements to the forecast algorithm, particularly the pitching side. Through December and January, I was able to incorporate the component scores into the hitter forecasts, and produce an improvement over the whole stat-line approach I had been using. I’ve been trying to do the same thing for pitchers, and just this morning cracked the ‘prior performance’ barrier. While I’m still working on the improvements, I felt good enough about them to incorporate them into new model run. While the changes were dramatic for some pitchers, the effects on teams wasn’t so large – the new method does not shake up the standings. But new standings, new depth charts, and new projections are on-line.

Like the old system, the projection is based on a Marcel-like baseline. Where it differs from Marcel is that the different statistics have different weighted averages, and I use the translated data throughout the process. Strikeouts are very heavily weighted towards the most recent season – roughly a 5-2-1, rounding off, with a small (~15%) regression to mean component. Walks and groundball rates are also highly slanted, though not as much as Ks. At the other extreme, hit rates have essentially no weight for seasons (1.2 – 1.1 – 1), and an 85% regression to mean, which is why stats like FIP work. Once the baselines are calculated for everybody, I go through a similar-player search, and then see how those similar players deviated from their baselines in the following year(s), and apply those deviations to the players. Once all this is done, I run the player through the translation routine backwards to get his stats back into an expected-2012 performance baseline.

I’m testing the new projection system against the set of all pitchers, who had 50 major league innings in 2011, who pitched for only one team in 2011, and who had a major league appearance in 2010. My note says that is 437 pitchers. I’m only looking at five top-level stats for judgment – hits, walks, strikeouts, homeruns, and runs allowed. The projection is normalized to the actual innings pitched in 2011, and I just look resulting errors tabulated.

Here’s the root-mean-square error you get from just using the player’s 2008-10 (major league) stats as your 2011 projection:

Hits 14.03     HR  4.09     BB  8.74     SO  12.72     R  11.48       Sum= 51.06

Same thing, but using his translated stats for 2008-10 as the projection:

Hits 13.76     HR  3.57     BB  8.01     SO  14.26     R  10.66      Sum= 50.26

Lower is better, so this gives us the not terribly surprising result that using reasonably adjusted minor league data in addition to major league data is better than major league data alone. Incidentally, if I use the luck-free runs allowed instead of actual runs – that would be calculated runs, using a normal number of H/BIP and HR/FB – the run error would drop almost a run, to 9.85.

Here’s the results of the program I’d been using to use projections for the past two months:

Hits 11.66     HR  3.27     BB  7.62     SO  13.03     R  9.88      Sum= 45.46

And here’s the results I’m getting from the new version, as of 11:00 PM Sunday night:

Hits 11.48     HR  3.33     BB  7.59     SO  13.16     R  9.47      Sum= 45.03

I’m more than a little annoyed at seeing the strikeout numbers trend backwards; on the other hand, the improvements everywhere else suggest that I’ve got a blind spot  – a hole in my swing, as it were – probably a calculation error that should lead to a nifty improvement once I track it down.

In case you were wondering about over-fitting, I am also checking the routines against 2009 and 2010 pitchers, who are not part of the test set. The improvements there are about 3/4  size of the 2011, which suggests some mild overfitting, but not enough for me to be worked up over. At least not yet.





8 Responses to Updated projections, with a new pitching routine

  1. Anon says:

    Do you adjust for changes in role (SP to RP and vice versa)?

    • Anonymous says:

      Yes, but not as elaborately as possible. The driving routine simply makes a forecast for a pitcher based on recent usage and what his comparables did; for someone like Bard or Sale, who has been a pure reliever, that projection may very well be for something like 50 games, and 0 starts.

      When I put those numbers into a spreadsheet – which is where I set player positions and playing time for the majors – then there will be adjustments made to pitchers whose role is different from their projection. Being a starter reduces Ks and walks, increases H (slightly), and increases R. R will go up by about a half a run a game, but will vary a little from one pitcher to another. If the machine projections were for a starter, then no changes are needed; if he was projected as a reliever, the full change is taken; if was forecast at half-and-half, then the changes will be added at half the full value.

      It is an area I’d eventually like to make more rigorous.

  2. […] Davenport Projections: Rangers Still Awesome February 13, 2012Clay Davenport updated his new forecast system today, with a lengthy explanation I had to read two or three times just to […]

    • clayd says:

      Fixed. He’s one of those players who came up under a different name, and I had both names floating around the database.

  3. Cliffly, The Adverb says:

    Cliff Lee’s WHIP of 1.00 seems a bit optimistic. He’s only had a WHIP that low once in his career.

    Thanks for doing these, by the way, and for making them public.

    • clayd says:

      Agreed, with two comments:

      1) I found a bug in the code, converting TBF into IP, after known events like H/BB/SO are accounted for. It was using the default (“translated”) value instead of the 2011 value, which I am using as the forecast for 2012. That changes Lee’s IP from 224 to 217, and raises WHIP from 1.00 to 1.03.

      2) Using 2011 as the guide means a lower offense level than at any other point in Lee’s career, and a league WHIP closer to 1.3 rather than 1.4. His WHIPs since mid-2009, *normalized to a league average of 1.31*, are 1.073 (Phi 2009), .919 (Sea 2010), 1.029 (Tex 2010), and 1.027 (last year)…I certainly won’t feel bad about a 1.03 projection.

  4. Dave says:

    Hey Clay,

    Great stuff. We’d love to include these in our Consensus Projections. Give me a shout at the email provided if you have any concerns. We just launched with an initial group of sources yesterday (click on “edit sources”).



Set your Twitter account name in your settings to use the TwitterBar Section.