Kind of upstaged by today’s events, but I uploaded the 2014 Cuban stats to the site this weekend. Check under the “DTs by League” tab, and then change League to Cuban Serie Nacional.
I was working on a longer post to detail the fairly major change I made to the DT procedure for Cubans, but decided I kind of have to go now. So, short form and I’ll try to fill in the details at a later time.
So, first, a general word on the Cuban Serie Nacional, their top level league. The league consists of 16 teams, one for each of the country’s 15 provinces, plus one for the city of Havana. Until a few years it was one for each of the nation’s 14 provinces, plus two for Havana, but then the province of La Habana got split in two. They actually played with 17 teams for one year before axing one of Havana city’s two teams. Players generally play for their home region; there is little movement between teams.
The normal schedule length for the SN is 90 games, which allows for home-and-away three game series against each of the other 15 teams. The season runs roughly early November through March. In years with a World Baseball Classic, they have played just 45, sacrificing half the season to get their best players in front of the world. This past year – by that I mean the 2013/14 season, not the one that is currently being played – they played a 45 games schedule for all 16 teams, and then followed that with another 45 games between just the top 8 teams from the first half. It appears that some level of taking players from the bottom 8 teams was permitted. I’m honestly not sure how to handle the two halves. The stats listed under the link is just for the first half; the stats for the second half are listed here, under the “CB2” label. The second half had a slightly higher quality rating than the first half, which is reflected in the translations and part of why I didn’t just want to run them all together.
The quality rating for the league came in at .60, exactly halfway between my ratings for the high A leagues (.551) and AA (.642). The second half, with just the supplemented stronger teams rated as essentially AA (.63). That is stronger than I have rated the league in the past, but backed up by the performances of multiple players.
On the first run of the stats, I used the same DT method as last year, but took a close look at all the Cuban players who played outside of Cuba. For the first time, this included several players from the just-completed Cuban season, as a few Cuban players were allowed to play in Mexico and Japan. I compared the translation I made for their last three seasons in Cuba with those of their first three seasons after Cuba; I ignored players who had less than 200 total PA on either side of the transition; I ignored players who had a three-year or more layoff between their Cuban and American playing days.
I found that I was generally too pessimistic, especially in one particular category
EQA POW SPD Krt Wrt BABIP Average of Cuba DT .226 2 1 -1 -1 -13 Average non-Cuba DT .246 3 -1 0 -4 -2
The component scores reflect how many runs above or below average a player is based on a particular aspect of his performance. The Power score, for instance, reflects how many runs better than an average player he would be if his power – home runs and some doubles – were the only thing that was different between him and an average player over the course of 600 plate appearances.The SPD score is based on steals, triples, and doubles; the Krt is all about strikout rates; the Wrt is all about walks (and hit by pitch). The BABIP is about singles, and that is where my translations went wildly wrong in many cases. Yoennis Cespedes came to the US with a -17 BABIP , but he’s really been +2.
And it wasn’t simply that the translation program was too harsh – the BABIP numbers just didn’t match up. For the 30 players I tested, the correlations between Cuba BABIP and American BABIP was just .16. By contrast, POW had an .89 correlation, and SPD .77. EQA was at .55. The procedure was just a complete mess on this particular statistic.
I found, with testing, that I could make a much better estimate of the Cuban players’ American BABIP scores by looking at their other (Cuban) statistics. The regression equation that came out was
BABIP = -2 + 1.1*SPD + .30*POW + .49*Wrt + .20*Krt
Take Cespedes, for instance. The average of his last three years in Cuba was for a .244 EQA, and component scores of POW 9, SPD 3, Krt 2, Wrt -2, and BA -17. Apply the function above, and we get a new projected BABIP of +4.
Work that back into the statistics, the new translation for Cespedes looks like
AB H DB TP HR BB SO R RBI SB CS Out BA OBP SLG EqA EqR POW SPD KRt WRt BIP 593 136 32 3 24 49 100 93 79 9 2 849 .230 .291 .416 .244 68 9 3 2 -2 -17 593 171 28 3 24 49 100 100 86 9 2 785 .289 .344 .469 .278 88 8 1 2 -2 4 591 165 31 6 27 46 115 96 108 11 7 1202 .280 .335 .491 .278 89 13 2 -2 -2 2
the first line is the old way of doing the DT, for Cespedes’ 2009-2011 seasons. The second line is the revised way, and the third line is his combined 2012-2014 totals. (All have been adjusted to 650 PA, with the exception of the Outs column). Clearly – a much better fit.
It wasn’t just Cespedes who benefitted. The correlation between projected and actual BABIP improved from .16 to .43. The correlation between projected and actual EQA improved from .55 to .63. It just works better, and I guess I’d have to be crazy not to use it.
While everything on this site is free, a donation through Paypal to help offset costs would be greatly appreciated. -Clay
- June 2017
- March 2017
- January 2017
- September 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- September 2015
- April 2015
- March 2015
- January 2015
- December 2014
- November 2014
- October 2014
- April 2014
- February 2014
- January 2014
- October 2013
- April 2013
- March 2013
- February 2013
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- August 2011
- July 2011
- June 2011
- May 2011