claydavenport.com

Cuba!

By clayd On December 18, 2014 · Leave a Comment

Kind of upstaged by today’s events, but I uploaded the 2014 Cuban stats to the site this weekend. Check under the “DTs by League” tab, and then change League to Cuban Serie Nacional.

I was working on a longer post to detail the fairly major change I made to the DT procedure for Cubans, but decided I kind of have to go now. So, short form and I’ll try to fill in the details at a later time.

So, first, a general word on the Cuban Serie Nacional, their top level league. The league consists of 16 teams, one for each of the country’s 15 provinces, plus one for the city of Havana. Until a few years it was one for each of the nation’s 14 provinces, plus two for Havana, but then the province of La Habana got split in two. They actually played with 17 teams for one year before axing one of Havana city’s two teams. Players generally play for their home region; there is little movement between teams.

The normal schedule length for the SN is 90 games, which allows for home-and-away three game series against each of the other 15 teams. The season runs roughly early November through March. In years with a World Baseball Classic, they have played just 45, sacrificing half the season to get their best players in front of the world. This past year – by that I mean the 2013/14 season, not the one that is currently being played – they played a 45 games schedule for all 16 teams, and then followed that with another 45 games between just the top 8 teams from the first half. It appears that some level of taking players from the bottom 8 teams was permitted. I’m honestly not sure how to handle the two halves. The stats listed under the link is just for the first half; the stats for the second half are listed here, under the “CB2” label. The second half had a slightly higher quality rating than the first half, which is reflected in the translations and part of why I didn’t just want to run them all together.

The quality rating for the league came in at .60, exactly halfway between my ratings for the high A leagues (.551) and AA (.642). The second half, with just the supplemented stronger teams rated as essentially AA (.63). That is stronger than I have rated the league in the past, but backed up by the performances of multiple players.

On the first run of the stats, I used the same DT method as last year, but took a close look at all the Cuban players who played outside of Cuba. For the first time, this included several players from the just-completed Cuban season, as a few Cuban players were allowed to play in Mexico and Japan. I compared the translation I made for their last three seasons in Cuba with those of their first three seasons after Cuba; I ignored players who had less than 200 total PA on either side of the transition; I ignored players who had a three-year or more layoff between their Cuban and American playing days.

I found that I was generally too pessimistic, especially in one particular category

                          EQA        POW      SPD      Krt    Wrt    BABIP

Average of Cuba DT        .226        2        1        -1      -1     -13

Average non-Cuba DT       .246        3        -1        0      -4      -2

The component scores reflect how many runs above or below average a player is based on a particular aspect of his performance. The Power score, for instance, reflects how many runs better than an average player he would be if his power – home runs and some doubles – were the only thing that was different between him and an average player over the course of 600 plate appearances.The SPD score is based on steals, triples, and doubles; the Krt is all about strikout rates; the Wrt is all about walks (and hit by pitch). The BABIP is about singles, and that is where my translations went wildly wrong in many cases. Yoennis Cespedes came to the US with a -17 BABIP , but he’s really been +2.

And it wasn’t simply that the translation program was too harsh – the BABIP numbers just didn’t match up. For the 30 players I tested, the correlations between Cuba BABIP and American BABIP was just .16. By contrast, POW had an .89 correlation, and SPD .77. EQA was at .55. The procedure was just a complete mess on this particular statistic.

I found, with testing, that I could make a much better estimate of the Cuban players’ American BABIP scores by looking at their other (Cuban) statistics. The regression equation that came out was

BABIP = -2 + 1.1*SPD + .30*POW + .49*Wrt + .20*Krt

Take Cespedes, for instance. The average of his last three years in Cuba was for a .244 EQA, and component scores of POW 9, SPD 3, Krt 2, Wrt -2, and BA -17. Apply the function above, and we get a new projected BABIP of +4.

Work that back into the statistics, the new translation for Cespedes looks like

 AB  H   DB  TP  HR  BB  SO  R  RBI  SB  CS  Out  BA   OBP  SLG   EqA EqR POW SPD KRt WRt BIP  
593 136  32   3  24  49 100  93  79   9   2  849 .230 .291 .416  .244  68   9   3   2  -2 -17
593 171  28   3  24  49 100 100  86   9   2  785 .289 .344 .469  .278  88   8   1   2  -2   4
591 165  31   6  27  46 115  96 108  11   7 1202 .280 .335 .491  .278  89  13   2  -2  -2   2

the first line is the old way of doing the DT, for Cespedes’ 2009-2011 seasons. The second line is the revised way, and the third line is his combined 2012-2014 totals. (All have been adjusted to 650 PA, with the exception of the Outs column). Clearly – a much better fit.

It wasn’t just Cespedes who benefitted. The correlation between projected and actual BABIP improved from .16 to .43. The correlation between projected and actual EQA improved from .55 to .63. It just works better, and I guess I’d have to be crazy not to use it.

Starting to project for 2015

By clayd On November 1, 2014 · Leave a Comment

I’ve rerun all the player cards with a 2015 projection, so you should be able to see indivduals up now.

Not that they won’t change between now and April, as I have any number of things to work through that are not included in this version. As a for instance, the default setting for league offense for 2015 is for the AL and NL to be the same as in 2014.

When I ran the cards for 2014, that default meant that I used the 2013 averages. That wasn’t so bad in the NL – league offense came in at 4.01 runs per 9 innings in 2014, compared to 4.04 in 2013 – but in the AL, offense had a far more substantial drop from 4.31 to 4.15. My forecasts for the majors as a whole was high by almost 500 runs, 2.4%; but about 350 of that came from the AL, against only 150 in the NL.

Stepping back for a historical perspective:

That’s runs per nine innings in the AL on a yearly basis (blue dashed lines) and as a 5-year moving average (solid red line). You’ll note that the 5-year average right now is on a solidly down and linear trend. It has fallen by 13, 9, 9, 9, and 6 points over the last five years. Should that trend continue, then next year’s 5-year average should fall another 9 points to 4.27 – which means that next year’s RPG would need to be about 4.01. The individual year trends also run to about 10 points per year, suggesting a 4.05. Absent some action by the league, it is hard to see the offense not dropping at least a little further next year. I’m thinking that something like 4.08 is a better forecast would be a better forecast than the 4.15 I would use by default.

Same deal in the NL:

Next year the last 4.40 RPG will scroll off the 5-year average, which is going to depress it even without further drops in the one-year average. Repeating last year’s 4.01 will cut the 5-yr average by another 7 points, to 4.11. Continuing the 10-point per year trend line would mean a one-year forecast of 3.87. You’ll note that the NL has operated with an apparent floor of about 3.8 runs, that only the 1968 season has penetrated since the dead ball era, so we are approaching the offensive levels at which the league has historically stepped in to make changes. I’m not as certain that the NL will decline further (as compared to the AL), so I think I will just let last year’s average roll forward.

Champion!

By clayd On October 6, 2014 · Leave a Comment

Kind of a nice thing to read. My pre-season projections seem to have topped the field, prostate at least in terms of root-mean-squared-error.

Tango

Jackie Robinson Day

By clayd On April 15, 2014 · Leave a Comment

A year go at this time, I was recuperating from brain surgery. And I had the idea that, for the anniversary of Jackie Robinson’s first major league game, I would run the translation process for Jackie Robinson’s 1946 season.

As it happened, I didn’t get home from the hospital unti April 12, which (combined with my fatigue level) meant I didn’t get a post ready by April 15. I did get it together shortly thereafter…can’t really remember now, but it was maybe a wek later.

But I didn’t post it. What would be better, I thought, was not just Jackie in isolation, but the whole International League for 1946.

And a little while later I had that, with a little help from the stats posted at baseball-reference.com. But I still didn’t post. Even better, I said to myself, would be the whole AAA for 1946, to be able to place Jackie in a reasonable prospect position. Sure I hadn’t planned on it, but after seeing what they had at b-ref I thought it would be pretty easy.

And it was. Still a little later on, I had the IL, PCL, and AA for 1946 all drawn up. It was nowhere close to Robinson’s debut date anymore, and I wasn’t coming up with any hook to wrap around it for a good post. So instead of posting anything, I just kept going, working my way through the AAA of 1947. And 1948. And so on.

By then we were through with the entire baseball season, and I still hadn’t posted anything about it. I had completed doing all the AAA teams, right up to 1980 where I had everything, and then started on b-ref’s list of Japanese teams.

So now, a year later, my head’s healed, and its another Jackie Robinson Day, and if I haven’t buried the lead enough already, and I’ve got translations for all AAA teams and players going back to 1946 posted on the site. And the Japanese Central and Pacific Leagues, along with all of their players, are translated back to those league’s debut in 1950. The links will be found under the “DTs by League” tab. The link for the league that started it all is here:

https://claydavenport.com/stats/webpages/1946/1946pageINTyearALL.shtml

In addition, the DTs links for all players should reflect their AAA stats.

There’s still some work to do with them. I still haven’t finished getting all of the fielding stats added, so lots of players are listed at the position “DH” – that’s the default when no fielding data is found. The park and difficulty factors are not as complete as they are for recent years…nature of the beast, I’m afraid. And there is, of course, no split data for minor league seasons before 2005.

Year Team         Lge  AB  H   DB  TP  HR  BB  SO  R  RBI  SB  CS  Out  BA   OBP  SLG   EqA EqR POW SPD KRt WRt BIP
1946 Montreal____ Int 470 137  33   5   3  67  42  90  52  33  11  349 .291 .392 .402  .284  75 -14  10  18   6   4 
1947 Brooklyn____ NL  600 168  37   3  16  68  54 134  50  39   0  445 .280 .372 .432  .289 100  -2  14  17   2  -4

Robinson’s 1946 DT shows a clearly above average hitter, one with excellent contact skills and outstanding speed. He has a bigtime number of doubles – interestingly enough, his minor league double count would be 42 when projected to the same 600 AB as his 1947 major league line – but a distinct lack of home runs. With the exception of the home runs, his lines are extremely close (and his 1948 would be equally similar).

Updates and changes by 4/12

By clayd On April 13, 2014 · Leave a Comment

Not quite two weeks into the season, and already a number of expected starters have dropped off the radar for the season.

Players who changed teams: Eduardo Nunez from NYY to MIN, the Mariners lose Carlos Triunfel to the Dodgers, Mike Fontenot goes from the Nationals to the Rays, the Cubs release Mitch Maier, the Brewers release Joe Thurston, Henry Blanco retires from the Diamondbacks, and the Giants waive Roger Kieschnick and the Diamondbacks pick him up; Nunez was the only one of the group I expected to have more than token playing time.

Among pitchers, Pedro Beato goes from the Reds to the Braves, Michael Brady travels from Miami to Anaheim, Prestonm Guilmet is traded from Cleveland to Baltimore, and Brian Omogrosso was released by the White Sox. I don’t think any of the pitchers were projected for more than 20 innings.

Other adjustments:
Braves: Dan Uggla’s .146 EQA has me raising the chances of him being replaced, by either Pastornicky or La Stella; BJ Upton’s .125 won’t hold off Logan Schafer. Ian Thomas, Gus Schlosser, and Pedro Beato pick up bullpen garbage time. The Marlins have to deal with Jacob Turner’ shoulder injury…I’ve taken 10 starts off him for now, but its still TBD how much time he’ll miss. The Mets had a more drastic shakeup of the pitching staff. With Parnell out, Valverde takes over as closer, and Torres moves to a setup slot that pretty much removes him from starter contention. It also looks like Duda has moved ahead of Davis in the first base race. Nothing changed for the Phillies except a little shaking out of the back end of the bullpen. The Nationals lose a month of Wilson Ramos; it also looks like Ryan Zimmerman will get time at 1B, which likely works out to plus time for Danny Espinosa.

Orioles: Small changes. Boosted Delmon Young at Nolan Reimold’s expense, raised the innings for guys currently in the pen like Britton, Stinson, and Meek; took Suk-min Yoon out of starter contention down the line after a disastrous first start in Norfolk. For Boston, The injuries to Middlebrooks and Victorino don’t really change their projections, but Bradley’s early hitting raises his future – that comes off my projections for Nava, Gomes, and Carp. Ryan Roberts slots in as a 3B backup. For the Yankees Yangervis Solarte soaks up most of the PA I’d given to Nunez. Scott Sizemore is off to a hot start, which could push out one of Brian Roberts or Kelly Johnson. No big changes for the Jays, while in Tampa Matt Moore’s injury really shakes up the rotation. I’ll bring in Erik Bedard for a dozen starts, add Cesar Ramos for a few, enhance Matt Andriese and Nate Karns, and shore up Jake Odorizzi’s job security.

White Sox: Avisail Garcia’s shoulder injury forces the Sox into the outfield I’d have started with (never been a Garcia fan). The forecasts for all of them were a little muddy, with four fairly equal players for three spots; Jordan Danks is less likely to force any of Eaton/Viciedo/De Aza out of a job. I guessed wrong about Nate Jones getting the closer job out of camp, so there’s a pretty strong bullpen shuffle there. Lindstrom’s not clearly better than any of Downs, Webb, or Jones, but the guy who has the job now has a clear advantage for playing time over any contender. No substantive changes for the Indians. The Carlos Santana experiment has lasted this long, so it gets a little bump up. Twins: Scaled back Buxton’s arrival, as his wrist injury isn’t healing rapidly…plus, its a wrist injury. Terrible ting for a ballplayer. Switched Chris Colabello in for things that were Parmelee. For the Tigers, the only thing I’ve got is some worry about Joe Nathan. I don’t have even that much to change for the Royals.

The A’s needed a lot of work, led by the demotion of Jim Johnson from closer. He could straighten himself out and regain the role, so he retains a share of the saves, but only as part of an even spread. The first base/DH spot is breaking a little differently than I envisioned, although more Callaspo/less Barton is a pretty good call for the team. In LA, the Angels will be without Josh Hamilton for a quarter of the season after he hurt his thumb sliding head-first into first. That should mean a lot more JB Shuck, as well as more Cowbell…er, excuse me, Cowgill…thrown in as well. I also did a little reshuffling in the bullpen, with Burnett being slow to return and Brian Moran lost for the season. The Rangers’ rotation remains an injury-riddled mess, with Scott Baker added as another option. With the Mariners, I shorted Logan Morrison a bit, as we get to see how the OF/1B/DH shuffle arranges itself. Nothing has changed yet for the Astros.

Arizona is already starting to question their starting pitching choices in the wake of Patrick Corbin’s injury, as Randall Delgado will move to relief while Josh Collmenter gets another shot at starting. Archie Bradley will be up there sooner or later. The Dodgers were pretty much set – though I did change second base from “Leans Dee Gordon” to “Safe Dee Gordon”. For the Giants, I enhanced Michael Morse as a clearer LF starter. With the Padres, I didn’t even have that. All I’ve got for the Rockies is to clarify Charlie Blackmon as the primary center fielder.

Chicago’s Cubs are another team getting into the closer shuffle, as Jose Veras is demoted. I also have to go stronger on Emilio Bonifacio than I wanted to. The Cardinals are still just as set as they were coming in – I’m just amazed at how much minor league talent they still have in the wings. With the Brewers, Henderson loses the closer fight with Francisco Rodriguez, among many changes in their bullpen from my forecast. For Cincinnati, I’m getting extremely worried about Mat Latos’ condition, so he gets a big drop in starts. On the plus side, Aroldis Chapman seems to be coming along on the short side of initial estimates, so his PT actually egts raised a bit. The biggest change I made for the Pirates is to bump up Gregory Polanco’s time, because the way he’s hammering AAA they won’t be able to keep him down much longer.

Updated Projections, 2/8

By clayd On February 10, 2014 · Leave a Comment

Hello everybody. Peabody here.

Shame I can’t earn any endorsements for the upcoming movie, because I can so do that voice. At least the original one, and when I’m not cold-ridden like I’ve been this weekend, pretty much confining myself to the room with the wood stove.

So I have looked back at the projections I released two weeks ago, and I did find one major mistake. Yes,there was much criticism of my methods being extremely conservative and not deviating very far from average – criticism which I didn’t necessarily take at full value, because, well, it is generally true. The methods, and the decisions leading to those methods – things like forcing the league totals to conform to last season’s league totals – force the system into a conservative mode. My default assumption is that there was nothing out of the ordinary.

But when I ran followup tests, like the average error of forecast components from the last few years – I found that I was going seriously astray. The process was something like this:

a) run analysis of the player’s performance over the last three years to set a baseline of expected performance. That is essentially just a weighted average of the last 3 seasons, with weights that vary by stat – some are more sensitive to just the most recent season, some to the entire three-year average, and some have little predictability at all.
b) compare that baseline performance with the baselines of players from baseball history. try to see if there is a consistent deviation from those baselines that can be applied to the current player.

Now, the weights in step A could be something like .523, .233, .150, which are for the hitter’s strikeout component. You’ll notice that they only add up to .896. That difference between the sum opf the components and 1 is a measure, a recognition, of regression to the mean – partiuclarly since my components are zero-based to league average. For a highly predictive statistic like batter K, the sum is close to 1. For hitter batting average, the sum is only .620; for pitcher delta-runs, it is barely 0.2.

The second step goes something like Baseline+delta*x, where delta is the difference between comparison players and their baselines, and x is an indicator of how useful those adjustments are. They go as high as 1, for speed and power, and are pretty near zero for things like those pitcher delta-runs.

The trouble is that I calculated the x component in a way that repeated the regression to the mean, essentially (baseline+delta)*x. The RTM was being double counted.

For an average player, the difference was essentially meaningless. But the more extreme they were, in any facet that I measured, then the bigger the effect. So Mike Trout, above average across the board, went from

                  BA   OBA  SLG   EQA EQR  WARP cPOW cSPD cSO cBB cBA  
Mike Trout     0.302 0.386 0.510 .316 110  7.3   11    4    0   5  12  
               0.306 0.406 0.530 .332 123  8.9   12    7   -1   8  15

(these are from the ‘all hitters’ section, straight from the computer, without regressing to league norms; the numbers on the projection pages will be a little lower).

Speed was dramatically affected, in part because the most extreme players are so much farther from the average. Trout went from 26 SB to 40; Billy Hamilton went from 43 to 72. His power dropped from a -9 component before to -11 now (sometimes the R-T-M works in your favor). Miguel Cabrera went from 30 HR to 36.

Fortunately, the pitchers weren’t similarly affected; the double-counting coding error didn’t happen in that directory. I did take advantage of my analysis to updte the weights, which made for some differences. And the overdone regression to means had infected the fielding analysis as well, so that teams with good fielding weren’t getting enough credit for it, which did feed back on the pitcher ratings.

The effect on teams was dependent on having extreme players. Those that did, benefitted by perhaps a win, maybe two. It did let a little more spread into the standings, with peak wins inching up from 91 to 93 and min wins dropping from 67 to to 66.

So a quick look at the changes on the team level since 1/24, not all of which come from my code changes:

AL East: was TB 90 Bos 86 NY 85 Tor 78 Bal 77
         now    90     89    86     79     81

The Orioles’ gain is mostly from me jumping the gun and sending A.J. Burnett their way, as he represents a big upgrade over their assorted fifth starter contenders. There was also a component for opponent quality that wasn’t kicking in – while the teams in the AL East were being judged harshly because of their ferocious schedule (playing other AL East teams), they weren’t receiving the compensating break – that their record isn’t an unbiased assessment of quality when they are NOT playing in the AL East.

AL Central: was DET 91 Cle 85 CWS 79 KC 77 Min 72
                Det 89     82     78    77     71

And that quality change I just spoke kicks the AL Central in the teeth. Kansas City does well to stand pat with their 77-win forecast – the addition of Bruce Chen helps a little – while everyone else drops 1-3 games.

AL West: was OAK 88 TEX 87 LAA 84 Sea 83 Hou 70
         now     91     85     86     81     67

Fixing the RTMs hurt Houston. Seattle was especially hurt by the changes in fielding, as they will have a lot of positional uncertainty – even the presumptive addition of an overrated Nelson Cruz doesn’t save them from a drop. Hosuton was also hurt by that, but Oakland did just fine. Trout alone benefitted by 15 runs from fixing the RTM error, and the Angels gained two games.

NL East: was WAS 87 ATL 85 NY 78 Mia 75 PHI 72
         now     88     84    77     73     73

The Nationals gain a game on the Braves, based on the changes I made, because I don’t believe there’s been any player movement outside the bullpens.

NL Central: was STL 90 PIT 83 CIN 80 MIL 77 CHC 67
            now     93     83     78     80     66

The Brewers added Matt Garza and Francisco Rodriguez, both pretty nice pickups, and Mark Reynolds makes their first base situation a little less desperate…but I am surprised at how they’ve switched places with the Reds. I promise, I’m not making any deliberate moves to hold the Reds back, but they keep on slipping.

NL West: was LA 88 SF 85 SD 83 Ari 78 Col 71
                89    85    81     78     72

Not much change here, with the most notable one being the Padres’ loss of Luebke for the season. I don’t see Arroyo doing much but adding depth – he’s no better than the mostly Randall Delgado innings he replaces – and ditto for Maholm and the Dodgers.

Chris Davis’ power projection

By clayd On February 9, 2014 · Leave a Comment

Davis’ component-power score is only projected to be a +35 (he says “only”), after putting up a surprising 50 last year.

His three-year line for power going into 2014 is 29, 27, 50. I did a quick search for players who

1) had 250 PA each across a four season span
2) were 26-30 in the fourth season (Davis will be 28 this year);
3) averaged at least a +15 power in the first two years
4) was at least 15 runs better than each of the first two years

That gave us this list:

               yr4  age4 pow1 pow2 pow3 pow4 pow4-3
Jack Cust      2008   29   9   26   43   41   -2
Juan Diaz      2001   27  20   12   39   31   -8
Jim Gentile    1962   28  19   26   51   25  -26
Willie Horton  1969   26  22   26   41   21  -20
Todd Hundley   1997   28  19   20   43   36   -7
Adam LaRoche   2007   27  17   15   33   14  -19
Joey Meyer     1988   26  22   21   43   17  -26
Jai Miller     2012   27  20   19   38   16  -22
Kevin Mitchell 1990   28  16   18   54   32  -22
Mike Napoli    2009   27  22   15   39   19  -20
Dave Nicholson 1969   29  19   17   37   20  -17
Gene Oliver    1962   27  20   14   35   10  -25
David Ortiz    2004   28  17   17   32   34    2
Carlos Pena    2008   30  24   14   52   31  -21
Mark Reynolds  2010   26  20   22   39   34   -5
Tony Solaita   1976   29  22   19   54   15  -39
Gorman Thomas  1979   28  22   28   48   55    7
Jason Thompson 1982   27  12   21   42   30  -12
Jim Wynn       1968   26  15   16   35   29   -6

Only 2 out of 19 players (David Ortiz and Gorman Thomas) managed to up their power score yet again in the fourth year. The mean change from year3 is -15; the media change is -20. After averaging 42 in the boom year, they averaged a (still repectable, and still better than the first two years) 27 in the fourth year.

…and replies to the first draft

By clayd On January 28, 2014 · Leave a Comment

mark says:

Show your projections from last year.

On the Projections page, there are links to the 2012 and 2013. They are from the saved spreadsheets that I have from the dates given, and run through the same csv-to-webpage script I used to make the current pages.

tangotiger says:
To help people understand how the #1 team is forecasted to “average” 91 wins, can you also show the averages for #1 through #30? That is, take the highest win total for each of your simulations (regardless of team), and show us that average. Then do the same for the second highest and so on.

Andy says:
He has no team winning more than 91 games… very likely.. lol

Tom Sheffield says:
It’s still way too early for projections like this but I do find great fault with 91 wins being the best record in baseball this year. The AL East looks about right standings wise.

There’s an issue here that I find hard to explain.

It is almost certainly NOT the case that the best record in baseball will only amount to 91 wins. In fact, if you looked at the playoff chances page, you’ll see that the AL East says this

Average wins by position in AL East: 95.2 87.7 81.9 76.1 68.5

indicating that it will take 95 wins, on average, to win the division – even though no team in the division, on average, gets above 90. Every division, in fact, takes 94-95 wins to finish first. WTF? Teams don’t win _on average_. The winning team will be the one who combines a good projection AND beats their projection. If the past three years are any indication, the average team is going to be 5 games off these projections – and a couple of teams will miss by 20. In the odds page, I play the season out a million times. In the real world, it will only play once, and how you perform relative to your projection determines your final standing.

There is no doubt in my mind that the best teams will be better than their projection, and the worst teams will be worse. Last year, the six first place teams averaged 8.7 wins better than their projection. Only the Tigers were able to underperform their projection and still win their division.

The six second place teams were +6.5.
The third place teams averaged -0.2…basically zero. Just meeting your projection is a recipe for mediocrity.

The fourth place teams averaged -3.
The last place teams averaged -10.

Whether the projection error comes from mis-estimating the real quality, or just random luck, or a mid-season tradeoff of talent from the weak to the strong that exaggerates the difference…there will be errors, and they have as much to do with deciding the winners as real talent. I’m sorry if that sounds like a copout.

David Lowe says:
You might want to tweak your software. The Royals aren’t going to be 9 games worse than they were last year, bro.
arttieTHE1manparty says:
Insane! How does the computer project the Royals to get worse??? With that defense and relief corps? No way…

Any projection is going to upset fans of various teams, especially if the projection comes in lower than they think is deserved.

With the Royals, the big concern for me is the pitching. I expect Shields to come back about a half run in ERA, and I don’t see quality replacements for Santana and Chen, who surprisingly put up over 400 IP @ 3.50 ERA. Two things I will concede – there is some evidence, looking at the last two years of projections, that I under-count defense…or rather, that teams with good(bad) defense don’t get their runs allowed moved down(up) enough. The Royals and Orioles are two teams who might be suffering from that bias…if it is real. It didn’t show up in the 2011 data with nearly the same effect as in 2012-13.

Now, Guthrie at a 5.00-ish ERA. I’m perfectly comfortable with that projection. He was 20 runs above average in the DR component – my way of saying he gave up 20 runs less than expected, base don his other stats. He doesn’t have a history of putting up that kind of number, and even if he did, that component score heavily, heavily trends towards zero in future years. The issue I have with the projection, in retrospect, is that there’s no way he gets 30 starts with that levelof performance. Its not as though there’s a ton of depth there, though, so its not going to make a big difference, but future iterations are liable to come up a a couple of wins for them. It IS a process to run these stats, and this was just an opener.

JR says:
Sorry, but if you think the Reds will be under .500 your computer has a bad virus.

I predict that Cincinnati fans will become thoroughly sick of the phrase “you can’t steal first base” this season.

First Projections for 2014

By clayd On January 26, 2014 · Leave a Comment

My first run (that I’m willing to talk about) of projections for the coming season is now up on the 2014 Projected Standings tab. They have also been used to create a new Playoff Chances Report. And, of course, the individual projections that go into are available, again on the Projected Standings page.

American League
East	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
Tampa Bay	90	72	698	618	45.8	19.1	65.0
Boston	86	76	723	680	22.8	19.2	42.0
NY Yankees	85	77	683	646	21.6	18.8	40.4
Toronto	78	84	720	749	5.9	7.8	13.7
Baltimore	77	85	693	733	3.9	5.5	9.4

Central	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
Detroit	91	71	711	618	60.1	14.9	75.0
Cleveland	85	77	717	682	24.1	19.7	43.8
Chicago WS	79	83	682	701	8.2	9.9	18.1
Kansas City	77	85	680	712	5.9	7.5	13.4
Minnesota	72	90	669	752	1.7	2.4	4.1

West	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
Oakland	88	74	723	655	35.9	20.7	56.5
Texas	87	75	731	676	30.6	20.7	51.3
LA Angels	84	78	712	685	17.5	16.9	34.4
Seattle	83	79	707	690	15.2	15.7	30.9
Houston	70	92	676	781	0.8	1.2	2.0

National League
East	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
Washington	87	75	661	612	46.2	17.0	63.2
Atlanta	85	77	673	641	34.3	18.4	52.7
NY Mets	78	84	639	666	10.7	10.3	21.0
Miami	75	87	616	670	5.5	5.9	11.4
Philadelphia	72	90	615	690	3.3	3.7	7.0

Central	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
St Louis	90	72	698	619	58.0	17.4	75.3
Pittsburgh	83	79	660	639	21.7	21.4	43.0
Cincinnati	80	82	633	640	12.3	15.5	27.8
Milwaukee	77	85	654	690	7.4	10.8	18.3
Chicago Cubs	67	95	598	721	0.6	1.2	1.8

West	Won	Lost	Runs	Runs A	Champ	Wild Card	Net Playoff
LA Dodgers	88	74	649	593	40.3	21.8	62.1
San Francisco	85	77	659	624	27.6	22.0	49.7
San Diego	83	79	670	648	22.3	20.6	43.0
Arizona	78	84	651	676	8.4	11.5	19.8
Colorado	71	91	655	748	1.4	2.5	4.0

To build these projections, I:

1) Run a computerized projection scheme, using the last three years of player performance compared against a database of all players’ four year performances. The algorithm attempts to find the most similar players, in terms of age, position, build, and performance, and the top 20 players are noted on the individual player cards.

2) Take those performances, and enter them into a very large spreadsheet, where I fill in expected playing times for all of the players. Every team, every position has to equal 100%. There have to be 162 pitching starts. Generally speaking, a) no position player gets more than 90%, and pitchers are mostly capped at 32 starts; b) rookie starters don’t get more than 80%; c) players I don’t think can hold the job all year certainly get less; d) the playing time estimates from the computer tend to carry a lot of weight. I normally set a sure starter to the 5% playing time level that first passes their projected PA, while innings are usually held under the computer’s values.

All of the statistics in the spreadsheet get rebalanced and weighted. Players on teams with high OBAs will get more plate appearances. Defense trickles back into pitchers hits (and runs) allowed. The league as a whole has to come out equal to the league totals of last year.

Current free agents won’t show up here – no team, no projected playing time. Their projections are still available on the “All hitters” and “All pitchers” downloads.

Getting to some of the players takes a deep depth chart. I’ve prepared some that you can find under the 2014 Spring tab, under “dts”. Every team has three files in there. One is a dt file, which contains the translated statistics, 2009-13, with the computer-only 2014 projection, for all hitters in that team’s system; another is a pdt file, which does the same for pitchers. The “orgdt” file just has the 2014 projections for all players on the team, sorted by position and projected WARP, like the one here for the Nationals. Kind of works as a very deep depth chart for all teams, although I can’t swear that aren’t players showing up on the wrong team (especially for players who have been released – there’s a decent chance they still show up for their old teams). That’s just for these depth charts – I am reasonably certain that every player used in the major league projections is actually a member of their team. The one exception might be Matt Garza, who I have already written into the Milwaukee rotation.

A Guaranteed Hall of Fame

By clayd On January 25, 2014 · Leave a Comment

Looking back on the Hall of Fame issues that came up, I think quite a few of the problems would disappear if they would just have a real election.

What, you say they already have one? No, they do not. Maybe I’m being overly pedantic, but an election, to me, is a way of choosing people to fill a position that must be filled. In particular, it has to result in a winner. The Hall of Fame selection process does not ensure a winner; it is more akin to the process of passing a piece of legislation than to the process of selecting a legislator.

The Baseball Hall of Fame has a pretty basic conflict. The Hall itself – and the community that founded it – desires, and needs, to have induction ceremonies held every July, and induction ceremonies without inductees is just bad for business. This argues for making voting easier, to ensure that we don’t have another repeat of 2013, when no one was selected.

On the other hand, they have given the keys of election to a group – the BBWAA – which seemingly takes more pride in denying entrance to the unworthy than welcoming the worthy. The procedures they have adopted also are intended to exclude all but the best.

Looking at things from a large, historical perspective, we see that major league baseball recognizes 2425 team-seasons in major league history – 1256 in the NL, 1048 in the AL, 85 in the 19th century American Association, 16 in the Federal League, 12 in the Union Association, and 8 in the Player’s League. Personally, I’d include all the teams in the National Association of 1871-75 as well, which would bump us up another 50, getting us to 2475.

There have also been 211 players elected to the Hall of Fame – not counting managers and Negro League players. I’d also include a few players from the NA days who were inducted as “pioneers”, but whose playing career demonstrates at least some worthiness (George Wright and Al Spalding for sure; Candy Cummings is more questionable). I’d also add to the list of players some obvious selections (based on their play) who have been denied entrance for moral failings of one kind or another – let us say Joe Jackson, Pete Rose, Mark McGwire, Barry Bonds, Sammy Sosa, and Roger Clemens. That is 220 players, 2475 teams, or a player for every 11.25 teams in history.

That was the most expansive definition. If I wanted to be stricter, I could just look at the 211 players selected to the Hall. And I could throw out the NA teams, and all the third leagues, and probably the first three years of the AA, when it’s quality level was way, way below the NL of the day. That produces a narrower list of 2358 teams. Ratios vary from 10.72 (using the largest number of players and smallest number of teams) to 11.73 (the reverse). To be less precise – there’s been a Hall of Fame player selected for every 11 or 12 teams in history.

Since there are currently 30 teams playing in the majors every year, it means that if you simply accepted the existing ratio as a guide, then we should be creating around 2.5 new Hall of Famers every year just to keep up.

So my proposal to the Hall of Fame committee is this – make it a real election. The top vote getter each year gets in, regardless of the vote count. The second-place finisher gets in, assuming a 50%+1 approval. The third (or more) person goes in if they can pull a 75% approval.

In all of Hall voting, there have only been two players who have finished first or second in the voting without currently being in the Hall of Fame – Craig Biggio (1st in 2013, and near-certain to crack the threshold at some later date) and Jack Morris (who finished 2nd in 2013). Even in third, there’s only a few cases – Jeff Bagwell in 2012-13, Tony Oliva in 1988, and Gil Hodges four times in the 70s. I don’t think the Hall would be in any way diminished by these inclusions.

How would the last 25 years elections have worked following my rules? I’m going to make the naive assumption that votes for other players would not have changed due to players that I’ve removed from the ballot by inducting them before their time.

1990 – Real inductees Jim Palmer and Joe Morgan are selected.

1991 – Real inductees Rod Carew, Gaylord Perry, and Fergie Jenkins.

1992 – Tom Seaver and Rollie Fingers are selected.

1993 – Reggie Jackson selected by the Hall, and then we have our first change. Phil Niekro finished second with 65.7% of the vote, and we put him in now rather than making him wait until 1997.

1994 – The real Hall tabs Steve Carlton, and we concur. But we will also honor Orlando Cepeda, who picked up 73.5% while finishing second, and won’t make him wait until a Veterans Committee meeting in 1999.

1995 – Mike Schmidt is selected. Phil Niekro was second, but we already have him, which means the “second-place” finisher was Don Sutton. 57% puts him in the Hall now instead of 1998.

1996 – No one is elected by the real Hall. Niekro was first, so we skip him; that makes Tony Perez #1, so in he goes without waiting four more years. Sutton is next, skip him, and that brings up Steve Garvey…but he only has 37% vote. Perez is our only inductee this year.

1997 – The real Hall chose Niekro, followed by Sutton and Perez, all of whom we’ve already honored. The top recipient, and our winner, even though he only had 39% of the vote, is Ron Santo. We salute him in 1997, instead of making him wait until the afterlife (died 2010, inducted to Hall in 2012).

1998 – The Hall selected Don Sutton. We skip him, and then skip Perez, and Santo, and then its welcome to the Hall of Fame, Jim Rice. We’re already under 50%, so he’s all alone, but he doesn’t have to wait another decade until 2009.

1999 – The Hall has a strong first-year class, and names Nolan Ryan, George Brett, and Robin Yount. We don’t have to change a thing.

2000 – Carlton Fisk is selected by the Hall, and we’re fine with that. Perez and Rice were next, and we already have them; our second place finisher is Gary Carter, but he is just under 50% and so will have to wait.

2001 – The Hall gives Dave Winfield and Kirby Puckett over 75%, so they are in.

2002 – Ozzie Smith is really elected. Gary Carter is second, and now has over 50% of the vote, so he gets in a year earlier than reality.

2003 – Eddie Murray finished first, and was genuinely elected, and Carter was also elected. Since we already have Carter in, our second-place finisher is Bruce Sutter, who qualifies with 54% approval. In three years early.

2004 – The real Hall names Paul Molitor and Dennis Eckersley.

2005 – The real Hall names Wade Boggs and Ryne Sandberg.

2006 – Sutter was the only real inductee that year. Ignoring him, and second-place finisher Rice, our top recipient is Rich Gossage. And our second place finisher is Andre Dawson, and 61% makes him a qualifying second-placer. Gossage goes in for us now instead of 2008, and Dawson moves up from 2010.

2007 – The Hall really does name two, Cal Ripken and Tony Gwynn, so our work is unneeded.

2008 – Gossage was the Hall’s real choice. We’re going to go past him, and Rice, and Dawson, and find ourselves a nice shiny Bert Blyleven. The next finisher would be Lee Smith, but he’s under 50%; so Bert has the podium to himself now instead of waiting until 2011.

2009 – Rickey Henderson is taken in reality, as was Jim Rice. Our second place finisher (after skipping Rice, Dawson, and Blyleven) would again be Lee Smith, but again he’s under 50% and is not inducted.

2010 – Reality elects Dawson, but we’ve had him in for four years already. Next was Blyleven, also in already. Our top finisher in 2010 is Roberto Alomar, so he goes in a year ahead of time. Our second place finisher is Jack Morris, and he does receive 50% of the vote, so he goes in, too. Morris is the first person we’ve inducted who has not made the actual Hall. However, like Cepeda and Santo, similarly rejected by the BBWAA, he’s a near-cinch for a future Veteran’s Committee.

2011 – Reality selected Alomar and Blyleven, but we have beaten reality to the punch. Barry Larkin is our inductee. Morris would have gotten in again, but skipping over him means that, for the third time, our second place finisher is Lee Smith. And for the third time, he is under 50%.

2012 – The Hall really chose Larkin; we’ll ignore him, and then ignore Morris. Our number 1 becomes Jeff Bagwell. Our number two, again, is Lee Smith; but this time he picked up 50.6% of the vote. He’s in!

2013 – No one was selected by the BBWAA this year. Craig Biggio was on top the list, though, so he is in. We can ignore Morris, and Bagwell in third, to get down to Mike Piazza. He’s our second-place man, and he’s got 58% support, so in he goes.

2014 – Just as in real life, Greg Maddux, Tom Glavine, and Frank Thomas.

So to summarize – this way guarantees that there will be someone to honor at Cooperstown each year. Players who aren’t selected in their first year tend to get in a couple of years earlier this way. Virtually all players who meet our rules but not the BBWAA 75% rule eventually get named to the Hall anyway. We’d have saved Orlando Cepeda and Ron Santo from the Veteran’s Committee. We would have inducted Jack Morris and Lee Smith, who have (definitely, probably) missed out from the BBWAA. We’ve already gotten to Bagwell, Biggio, and Piazza, who should all be eventual winners.

Clay

← Previous Entries Next Entries →

A repository for stats I still care about

Cuba!

Starting to project for 2015

Champion!

Jackie Robinson Day

Updates and changes by 4/12

Updated Projections, 2/8

Chris Davis’ power projection

…and replies to the first draft

First Projections for 2014

A Guaranteed Hall of Fame

Meta

Archives

Pages

The Latest

Happy New Year - Welcome to 2025

More