Not that it shows on the site, but its been a pretty busy week for me with baseball ideas.
The first priority has been to clean up and expand the DT pages (“By League” on the subject line). I’ve gotten a year key added in, and it does seem to be working for the years that have split data – from 2005 to 2011. I am getting things set up to have access to the full season DTs for all leagues going back to the Seventies.
Those files will include at least some winter leagues. The Arizona Fall League just wrapped up, and I’ve got their data entered in – it should come through on the league pages shortly. The Central and Pacific leagues in Japan wrapped up their regular seasons near the end of October, and I’ve gotten all of their data entered – that task certainly is much easier now that NPB has gotten their English-language site back up. And I’ve downloaded Cuban data for the last few years, but haven’t been able to process it yet – something I’m eager to do with recent attention on Yoennis Cespedes.
I’ve also been going through my programs and cleaning them up. Some of this code is going on 25 years old now, and it is full of blocks that aren’t commented as to what its doing – and my ability to just remember things like that isn’t what it was back then. There are blocks that don’t do anything any more but are still there, blocks that do something only to have it redone a different way immediately afterwards, meaningless variable names, things that are just plain sloppy. I know that I did it that way back when because a) I didn’t necessarily know any better, and b) I was usually in a big hurry to move on to the next thing. I look at some of this code now, especially in relation to the standards I have at work with NOAA, and its kind of embarassing, even if I am the only one who sees it.
One big clean-up was the way the program process “peak” translations. In a normal translation, you are adjusting a (usually minor-league) performance into an estimated stat line for the major leagues. A peak translation is similar, except that you are building an estimated stat line for the major leagues of some future year, when the player is at his peak – an estimate of how good he can be, rather than how good he is now. The primary determinant of that is still current ability level plus age, but the articles that Rany posted recently at BP, some looks I was taking at Bryce Harper, and stats I ran to validate league difficulty levels all go me thinking of other approaches to the problem. One big change is to yank all the future code out towards the end of the other program, letting the regular DT play all the way out before trying to adjust it; I had always jumped in early on, and sometimes I wound up with conflicts between the ‘normal’ and ‘peak’ DT levels. I also changed the adjustments from a system that was primarily multiplicative – “power increases 20%” – to one that is primarily additive – “power improves by 45 points”. That enormously simplified problems I had with overestimating players who had monster minor league seasons. Those changes should be apparent in the stats pages very soon.
I’ve also been validating projection systems from the 2011 season. While I’m pleased with how my system (which ran with some of Nate Silver’s ideas on PECOTA, threw out some of them, replaced them with some of my own tools and approaches, resulting in a chimeric Sildavenverport monster) graded, and I was also pretty shocked at just how little difference even the most complex systems made when compared to an ordinary three-year average.