Cuba! – claydavenport.com

Kind of upstaged by today’s events, but I uploaded the 2014 Cuban stats to the site this weekend. Check under the “DTs by League” tab, and then change League to Cuban Serie Nacional.

I was working on a longer post to detail the fairly major change I made to the DT procedure for Cubans, but decided I kind of have to go now. So, short form and I’ll try to fill in the details at a later time.

So, first, a general word on the Cuban Serie Nacional, their top level league. The league consists of 16 teams, one for each of the country’s 15 provinces, plus one for the city of Havana. Until a few years it was one for each of the nation’s 14 provinces, plus two for Havana, but then the province of La Habana got split in two. They actually played with 17 teams for one year before axing one of Havana city’s two teams. Players generally play for their home region; there is little movement between teams.

The normal schedule length for the SN is 90 games, which allows for home-and-away three game series against each of the other 15 teams. The season runs roughly early November through March. In years with a World Baseball Classic, they have played just 45, sacrificing half the season to get their best players in front of the world. This past year – by that I mean the 2013/14 season, not the one that is currently being played – they played a 45 games schedule for all 16 teams, and then followed that with another 45 games between just the top 8 teams from the first half. It appears that some level of taking players from the bottom 8 teams was permitted. I’m honestly not sure how to handle the two halves. The stats listed under the link is just for the first half; the stats for the second half are listed here, under the “CB2” label. The second half had a slightly higher quality rating than the first half, which is reflected in the translations and part of why I didn’t just want to run them all together.

The quality rating for the league came in at .60, exactly halfway between my ratings for the high A leagues (.551) and AA (.642). The second half, with just the supplemented stronger teams rated as essentially AA (.63). That is stronger than I have rated the league in the past, but backed up by the performances of multiple players.

On the first run of the stats, I used the same DT method as last year, but took a close look at all the Cuban players who played outside of Cuba. For the first time, this included several players from the just-completed Cuban season, as a few Cuban players were allowed to play in Mexico and Japan. I compared the translation I made for their last three seasons in Cuba with those of their first three seasons after Cuba; I ignored players who had less than 200 total PA on either side of the transition; I ignored players who had a three-year or more layoff between their Cuban and American playing days.

I found that I was generally too pessimistic, especially in one particular category

                          EQA        POW      SPD      Krt    Wrt    BABIP

Average of Cuba DT        .226        2        1        -1      -1     -13

Average non-Cuba DT       .246        3        -1        0      -4      -2

The component scores reflect how many runs above or below average a player is based on a particular aspect of his performance. The Power score, for instance, reflects how many runs better than an average player he would be if his power – home runs and some doubles – were the only thing that was different between him and an average player over the course of 600 plate appearances.The SPD score is based on steals, triples, and doubles; the Krt is all about strikout rates; the Wrt is all about walks (and hit by pitch). The BABIP is about singles, and that is where my translations went wildly wrong in many cases. Yoennis Cespedes came to the US with a -17 BABIP , but he’s really been +2.

And it wasn’t simply that the translation program was too harsh – the BABIP numbers just didn’t match up. For the 30 players I tested, the correlations between Cuba BABIP and American BABIP was just .16. By contrast, POW had an .89 correlation, and SPD .77. EQA was at .55. The procedure was just a complete mess on this particular statistic.

I found, with testing, that I could make a much better estimate of the Cuban players’ American BABIP scores by looking at their other (Cuban) statistics. The regression equation that came out was

BABIP = -2 + 1.1*SPD + .30*POW + .49*Wrt + .20*Krt

Take Cespedes, for instance. The average of his last three years in Cuba was for a .244 EQA, and component scores of POW 9, SPD 3, Krt 2, Wrt -2, and BA -17. Apply the function above, and we get a new projected BABIP of +4.

Work that back into the statistics, the new translation for Cespedes looks like

 AB  H   DB  TP  HR  BB  SO  R  RBI  SB  CS  Out  BA   OBP  SLG   EqA EqR POW SPD KRt WRt BIP  
593 136  32   3  24  49 100  93  79   9   2  849 .230 .291 .416  .244  68   9   3   2  -2 -17
593 171  28   3  24  49 100 100  86   9   2  785 .289 .344 .469  .278  88   8   1   2  -2   4
591 165  31   6  27  46 115  96 108  11   7 1202 .280 .335 .491  .278  89  13   2  -2  -2   2

the first line is the old way of doing the DT, for Cespedes’ 2009-2011 seasons. The second line is the revised way, and the third line is his combined 2012-2014 totals. (All have been adjusted to 650 PA, with the exception of the Outs column). Clearly – a much better fit.

It wasn’t just Cespedes who benefitted. The correlation between projected and actual BABIP improved from .16 to .43. The correlation between projected and actual EQA improved from .55 to .63. It just works better, and I guess I’d have to be crazy not to use it.