Kind of upstaged by today’s events, but I uploaded the 2014 Cuban stats to the site this weekend. Check under the “DTs by League” tab, and then change League to Cuban Serie Nacional.
I was working on a longer post to detail the fairly major change I made to the DT procedure for Cubans, but decided I kind of have to go now. So, short form and I’ll try to fill in the details at a later time.
So, first, a general word on the Cuban Serie Nacional, their top level league. The league consists of 16 teams, one for each of the country’s 15 provinces, plus one for the city of Havana. Until a few years it was one for each of the nation’s 14 provinces, plus two for Havana, but then the province of La Habana got split in two. They actually played with 17 teams for one year before axing one of Havana city’s two teams. Players generally play for their home region; there is little movement between teams.
The normal schedule length for the SN is 90 games, which allows for home-and-away three game series against each of the other 15 teams. The season runs roughly early November through March. In years with a World Baseball Classic, they have played just 45, sacrificing half the season to get their best players in front of the world. This past year – by that I mean the 2013/14 season, not the one that is currently being played – they played a 45 games schedule for all 16 teams, and then followed that with another 45 games between just the top 8 teams from the first half. It appears that some level of taking players from the bottom 8 teams was permitted. I’m honestly not sure how to handle the two halves. The stats listed under the link is just for the first half; the stats for the second half are listed here, under the “CB2″ label. The second half had a slightly higher quality rating than the first half, which is reflected in the translations and part of why I didn’t just want to run them all together.
The quality rating for the league came in at .60, exactly halfway between my ratings for the high A leagues (.551) and AA (.642). The second half, with just the supplemented stronger teams rated as essentially AA (.63). That is stronger than I have rated the league in the past, but backed up by the performances of multiple players.
On the first run of the stats, I used the same DT method as last year, but took a close look at all the Cuban players who played outside of Cuba. For the first time, this included several players from the just-completed Cuban season, as a few Cuban players were allowed to play in Mexico and Japan. I compared the translation I made for their last three seasons in Cuba with those of their first three seasons after Cuba; I ignored players who had less than 200 total PA on either side of the transition; I ignored players who had a three-year or more layoff between their Cuban and American playing days.
I found that I was generally too pessimistic, especially in one particular category
EQA POW SPD Krt Wrt BABIP Average of Cuba DT .226 2 1 -1 -1 -13 Average non-Cuba DT .246 3 -1 0 -4 -2
The component scores reflect how many runs above or below average a player is based on a particular aspect of his performance. The Power score, for instance, reflects how many runs better than an average player he would be if his power – home runs and some doubles – were the only thing that was different between him and an average player over the course of 600 plate appearances.The SPD score is based on steals, triples, and doubles; the Krt is all about strikout rates; the Wrt is all about walks (and hit by pitch). The BABIP is about singles, and that is where my translations went wildly wrong in many cases. Yoennis Cespedes came to the US with a -17 BABIP , but he’s really been +2.
And it wasn’t simply that the translation program was too harsh – the BABIP numbers just didn’t match up. For the 30 players I tested, the correlations between Cuba BABIP and American BABIP was just .16. By contrast, POW had an .89 correlation, and SPD .77. EQA was at .55. The procedure was just a complete mess on this particular statistic.
I found, with testing, that I could make a much better estimate of the Cuban players’ American BABIP scores by looking at their other (Cuban) statistics. The regression equation that came out was
BABIP = -2 + 1.1*SPD + .30*POW + .49*Wrt + .20*Krt
Take Cespedes, for instance. The average of his last three years in Cuba was for a .244 EQA, and component scores of POW 9, SPD 3, Krt 2, Wrt -2, and BA -17. Apply the function above, and we get a new projected BABIP of +4.
Work that back into the statistics, the new translation for Cespedes looks like
AB H DB TP HR BB SO R RBI SB CS Out BA OBP SLG EqA EqR POW SPD KRt WRt BIP 593 136 32 3 24 49 100 93 79 9 2 849 .230 .291 .416 .244 68 9 3 2 -2 -17 593 171 28 3 24 49 100 100 86 9 2 785 .289 .344 .469 .278 88 8 1 2 -2 4 591 165 31 6 27 46 115 96 108 11 7 1202 .280 .335 .491 .278 89 13 2 -2 -2 2
the first line is the old way of doing the DT, for Cespedes’ 2009-2011 seasons. The second line is the revised way, and the third line is his combined 2012-2014 totals. (All have been adjusted to 650 PA, with the exception of the Outs column). Clearly – a much better fit.
It wasn’t just Cespedes who benefitted. The correlation between projected and actual BABIP improved from .16 to .43. The correlation between projected and actual EQA improved from .55 to .63. It just works better, and I guess I’d have to be crazy not to use it.
I’ve rerun all the player cards with a 2015 projection, so you should be able to see indivduals up now.
Not that they won’t change between now and April, as I have any number of things to work through that are not included in this version. As a for instance, the default setting for league offense for 2015 is for the AL and NL to be the same as in 2014.
When I ran the cards for 2014, that default meant that I used the 2013 averages. That wasn’t so bad in the NL – league offense came in at 4.01 runs per 9 innings in 2014, compared to 4.04 in 2013 – but in the AL, offense had a far more substantial drop from 4.31 to 4.15. My forecasts for the majors as a whole was high by almost 500 runs, 2.4%; but about 350 of that came from the AL, against only 150 in the NL.
Stepping back for a historical perspective:
That’s runs per nine innings in the AL on a yearly basis (blue dashed lines) and as a 5-year moving average (solid red line). You’ll note that the 5-year average right now is on a solidly down and linear trend. It has fallen by 13, 9, 9, 9, and 6 points over the last five years. Should that trend continue, then next year’s 5-year average should fall another 9 points to 4.27 – which means that next year’s RPG would need to be about 4.01. The individual year trends also run to about 10 points per year, suggesting a 4.05. Absent some action by the league, it is hard to see the offense not dropping at least a little further next year. I’m thinking that something like 4.08 is a better forecast would be a better forecast than the 4.15 I would use by default.
Same deal in the NL:
Next year the last 4.40 RPG will scroll off the 5-year average, which is going to depress it even without further drops in the one-year average. Repeating last year’s 4.01 will cut the 5-yr average by another 7 points, to 4.11. Continuing the 10-point per year trend line would mean a one-year forecast of 3.87. You’ll note that the NL has operated with an apparent floor of about 3.8 runs, that only the 1968 season has penetrated since the dead ball era, so we are approaching the offensive levels at which the league has historically stepped in to make changes. I’m not as certain that the NL will decline further (as compared to the AL), so I think I will just let last year’s average roll forward.
Kind of a nice thing to read. My pre-season projections seem to have topped the field, at least in terms of root-mean-squared-error.
A year go at this time, I was recuperating from brain surgery. And I had the idea that, for the anniversary of Jackie Robinson’s first major league game, I would run the translation process for Jackie Robinson’s 1946 season.
As it happened, I didn’t get home from the hospital unti April 12, which (combined with my fatigue level) meant I didn’t get a post ready by April 15. I did get it together shortly thereafter…can’t really remember now, but it was maybe a wek later.
But I didn’t post it. What would be better, I thought, was not just Jackie in isolation, but the whole International League for 1946.
And a little while later I had that, with a little help from the stats posted at baseball-reference.com. But I still didn’t post. Even better, I said to myself, would be the whole AAA for 1946, to be able to place Jackie in a reasonable prospect position. Sure I hadn’t planned on it, but after seeing what they had at b-ref I thought it would be pretty easy.
And it was. Still a little later on, I had the IL, PCL, and AA for 1946 all drawn up. It was nowhere close to Robinson’s debut date anymore, and I wasn’t coming up with any hook to wrap around it for a good post. So instead of posting anything, I just kept going, working my way through the AAA of 1947. And 1948. And so on.
By then we were through with the entire baseball season, and I still hadn’t posted anything about it. I had completed doing all the AAA teams, right up to 1980 where I had everything, and then started on b-ref’s list of Japanese teams.
So now, a year later, my head’s healed, and its another Jackie Robinson Day, and if I haven’t buried the lead enough already, and I’ve got translations for all AAA teams and players going back to 1946 posted on the site. And the Japanese Central and Pacific Leagues, along with all of their players, are translated back to those league’s debut in 1950. The links will be found under the “DTs by League” tab. The link for the league that started it all is here:
In addition, the DTs links for all players should reflect their AAA stats.
There’s still some work to do with them. I still haven’t finished getting all of the fielding stats added, so lots of players are listed at the position “DH” – that’s the default when no fielding data is found. The park and difficulty factors are not as complete as they are for recent years…nature of the beast, I’m afraid. And there is, of course, no split data for minor league seasons before 2005.
Year Team Lge AB H DB TP HR BB SO R RBI SB CS Out BA OBP SLG EqA EqR POW SPD KRt WRt BIP 1946 Montreal____ Int 470 137 33 5 3 67 42 90 52 33 11 349 .291 .392 .402 .284 75 -14 10 18 6 4 1947 Brooklyn____ NL 600 168 37 3 16 68 54 134 50 39 0 445 .280 .372 .432 .289 100 -2 14 17 2 -4
Robinson’s 1946 DT shows a clearly above average hitter, one with excellent contact skills and outstanding speed. He has a bigtime number of doubles – interestingly enough, his minor league double count would be 42 when projected to the same 600 AB as his 1947 major league line – but a distinct lack of home runs. With the exception of the home runs, his lines are extremely close (and his 1948 would be equally similar).
Not quite two weeks into the season, and already a number of expected starters have dropped off the radar for the season.
Players who changed teams: Eduardo Nunez from NYY to MIN, the Mariners lose Carlos Triunfel to the Dodgers, Mike Fontenot goes from the Nationals to the Rays, the Cubs release Mitch Maier, the Brewers release Joe Thurston, Henry Blanco retires from the Diamondbacks, and the Giants waive Roger Kieschnick and the Diamondbacks pick him up; Nunez was the only one of the group I expected to have more than token playing time.
Among pitchers, Pedro Beato goes from the Reds to the Braves, Michael Brady travels from Miami to Anaheim, Prestonm Guilmet is traded from Cleveland to Baltimore, and Brian Omogrosso was released by the White Sox. I don’t think any of the pitchers were projected for more than 20 innings.
Braves: Dan Uggla’s .146 EQA has me raising the chances of him being replaced, by either Pastornicky or La Stella; BJ Upton’s .125 won’t hold off Logan Schafer. Ian Thomas, Gus Schlosser, and Pedro Beato pick up bullpen garbage time. The Marlins have to deal with Jacob Turner’ shoulder injury…I’ve taken 10 starts off him for now, but its still TBD how much time he’ll miss. The Mets had a more drastic shakeup of the pitching staff. With Parnell out, Valverde takes over as closer, and Torres moves to a setup slot that pretty much removes him from starter contention. It also looks like Duda has moved ahead of Davis in the first base race. Nothing changed for the Phillies except a little shaking out of the back end of the bullpen. The Nationals lose a month of Wilson Ramos; it also looks like Ryan Zimmerman will get time at 1B, which likely works out to plus time for Danny Espinosa.
Orioles: Small changes. Boosted Delmon Young at Nolan Reimold’s expense, raised the innings for guys currently in the pen like Britton, Stinson, and Meek; took Suk-min Yoon out of starter contention down the line after a disastrous first start in Norfolk. For Boston, The injuries to Middlebrooks and Victorino don’t really change their projections, but Bradley’s early hitting raises his future – that comes off my projections for Nava, Gomes, and Carp. Ryan Roberts slots in as a 3B backup. For the Yankees Yangervis Solarte soaks up most of the PA I’d given to Nunez. Scott Sizemore is off to a hot start, which could push out one of Brian Roberts or Kelly Johnson. No big changes for the Jays, while in Tampa Matt Moore’s injury really shakes up the rotation. I’ll bring in Erik Bedard for a dozen starts, add Cesar Ramos for a few, enhance Matt Andriese and Nate Karns, and shore up Jake Odorizzi’s job security.
White Sox: Avisail Garcia’s shoulder injury forces the Sox into the outfield I’d have started with (never been a Garcia fan). The forecasts for all of them were a little muddy, with four fairly equal players for three spots; Jordan Danks is less likely to force any of Eaton/Viciedo/De Aza out of a job. I guessed wrong about Nate Jones getting the closer job out of camp, so there’s a pretty strong bullpen shuffle there. Lindstrom’s not clearly better than any of Downs, Webb, or Jones, but the guy who has the job now has a clear advantage for playing time over any contender. No substantive changes for the Indians. The Carlos Santana experiment has lasted this long, so it gets a little bump up. Twins: Scaled back Buxton’s arrival, as his wrist injury isn’t healing rapidly…plus, its a wrist injury. Terrible ting for a ballplayer. Switched Chris Colabello in for things that were Parmelee. For the Tigers, the only thing I’ve got is some worry about Joe Nathan. I don’t have even that much to change for the Royals.
The A’s needed a lot of work, led by the demotion of Jim Johnson from closer. He could straighten himself out and regain the role, so he retains a share of the saves, but only as part of an even spread. The first base/DH spot is breaking a little differently than I envisioned, although more Callaspo/less Barton is a pretty good call for the team. In LA, the Angels will be without Josh Hamilton for a quarter of the season after he hurt his thumb sliding head-first into first. That should mean a lot more JB Shuck, as well as more Cowbell…er, excuse me, Cowgill…thrown in as well. I also did a little reshuffling in the bullpen, with Burnett being slow to return and Brian Moran lost for the season. The Rangers’ rotation remains an injury-riddled mess, with Scott Baker added as another option. With the Mariners, I shorted Logan Morrison a bit, as we get to see how the OF/1B/DH shuffle arranges itself. Nothing has changed yet for the Astros.
Arizona is already starting to question their starting pitching choices in the wake of Patrick Corbin’s injury, as Randall Delgado will move to relief while Josh Collmenter gets another shot at starting. Archie Bradley will be up there sooner or later. The Dodgers were pretty much set – though I did change second base from “Leans Dee Gordon” to “Safe Dee Gordon”. For the Giants, I enhanced Michael Morse as a clearer LF starter. With the Padres, I didn’t even have that. All I’ve got for the Rockies is to clarify Charlie Blackmon as the primary center fielder.
Chicago’s Cubs are another team getting into the closer shuffle, as Jose Veras is demoted. I also have to go stronger on Emilio Bonifacio than I wanted to. The Cardinals are still just as set as they were coming in – I’m just amazed at how much minor league talent they still have in the wings. With the Brewers, Henderson loses the closer fight with Francisco Rodriguez, among many changes in their bullpen from my forecast. For Cincinnati, I’m getting extremely worried about Mat Latos’ condition, so he gets a big drop in starts. On the plus side, Aroldis Chapman seems to be coming along on the short side of initial estimates, so his PT actually egts raised a bit. The biggest change I made for the Pirates is to bump up Gregory Polanco’s time, because the way he’s hammering AAA they won’t be able to keep him down much longer.
Hello everybody. Peabody here.
Shame I can’t earn any endorsements for the upcoming movie, because I can so do that voice. At least the original one, and when I’m not cold-ridden like I’ve been this weekend, pretty much confining myself to the room with the wood stove.
So I have looked back at the projections I released two weeks ago, and I did find one major mistake. Yes,there was much criticism of my methods being extremely conservative and not deviating very far from average – criticism which I didn’t necessarily take at full value, because, well, it is generally true. The methods, and the decisions leading to those methods – things like forcing the league totals to conform to last season’s league totals – force the system into a conservative mode. My default assumption is that there was nothing out of the ordinary.
But when I ran followup tests, like the average error of forecast components from the last few years – I found that I was going seriously astray. The process was something like this:
a) run analysis of the player’s performance over the last three years to set a baseline of expected performance. That is essentially just a weighted average of the last 3 seasons, with weights that vary by stat – some are more sensitive to just the most recent season, some to the entire three-year average, and some have little predictability at all.
b) compare that baseline performance with the baselines of players from baseball history. try to see if there is a consistent deviation from those baselines that can be applied to the current player.
Now, the weights in step A could be something like .523, .233, .150, which are for the hitter’s strikeout component. You’ll notice that they only add up to .896. That difference between the sum opf the components and 1 is a measure, a recognition, of regression to the mean – partiuclarly since my components are zero-based to league average. For a highly predictive statistic like batter K, the sum is close to 1. For hitter batting average, the sum is only .620; for pitcher delta-runs, it is barely 0.2.
The second step goes something like Baseline+delta*x, where delta is the difference between comparison players and their baselines, and x is an indicator of how useful those adjustments are. They go as high as 1, for speed and power, and are pretty near zero for things like those pitcher delta-runs.
The trouble is that I calculated the x component in a way that repeated the regression to the mean, essentially (baseline+delta)*x. The RTM was being double counted.
For an average player, the difference was essentially meaningless. But the more extreme they were, in any facet that I measured, then the bigger the effect. So Mike Trout, above average across the board, went from
BA OBA SLG EQA EQR WARP cPOW cSPD cSO cBB cBA Mike Trout 0.302 0.386 0.510 .316 110 7.3 11 4 0 5 12 0.306 0.406 0.530 .332 123 8.9 12 7 -1 8 15
(these are from the ‘all hitters’ section, straight from the computer, without regressing to league norms; the numbers on the projection pages will be a little lower).
Speed was dramatically affected, in part because the most extreme players are so much farther from the average. Trout went from 26 SB to 40; Billy Hamilton went from 43 to 72. His power dropped from a -9 component before to -11 now (sometimes the R-T-M works in your favor). Miguel Cabrera went from 30 HR to 36.
Fortunately, the pitchers weren’t similarly affected; the double-counting coding error didn’t happen in that directory. I did take advantage of my analysis to updte the weights, which made for some differences. And the overdone regression to means had infected the fielding analysis as well, so that teams with good fielding weren’t getting enough credit for it, which did feed back on the pitcher ratings.
The effect on teams was dependent on having extreme players. Those that did, benefitted by perhaps a win, maybe two. It did let a little more spread into the standings, with peak wins inching up from 91 to 93 and min wins dropping from 67 to to 66.
So a quick look at the changes on the team level since 1/24, not all of which come from my code changes:
AL East: was TB 90 Bos 86 NY 85 Tor 78 Bal 77 now 90 89 86 79 81
The Orioles’ gain is mostly from me jumping the gun and sending A.J. Burnett their way, as he represents a big upgrade over their assorted fifth starter contenders. There was also a component for opponent quality that wasn’t kicking in – while the teams in the AL East were being judged harshly because of their ferocious schedule (playing other AL East teams), they weren’t receiving the compensating break – that their record isn’t an unbiased assessment of quality when they are NOT playing in the AL East.
AL Central: was DET 91 Cle 85 CWS 79 KC 77 Min 72 Det 89 82 78 77 71
And that quality change I just spoke kicks the AL Central in the teeth. Kansas City does well to stand pat with their 77-win forecast – the addition of Bruce Chen helps a little – while everyone else drops 1-3 games.
AL West: was OAK 88 TEX 87 LAA 84 Sea 83 Hou 70 now 91 85 86 81 67
Fixing the RTMs hurt Houston. Seattle was especially hurt by the changes in fielding, as they will have a lot of positional uncertainty – even the presumptive addition of an overrated Nelson Cruz doesn’t save them from a drop. Hosuton was also hurt by that, but Oakland did just fine. Trout alone benefitted by 15 runs from fixing the RTM error, and the Angels gained two games.
NL East: was WAS 87 ATL 85 NY 78 Mia 75 PHI 72 now 88 84 77 73 73
The Nationals gain a game on the Braves, based on the changes I made, because I don’t believe there’s been any player movement outside the bullpens.
NL Central: was STL 90 PIT 83 CIN 80 MIL 77 CHC 67 now 93 83 78 80 66
The Brewers added Matt Garza and Francisco Rodriguez, both pretty nice pickups, and Mark Reynolds makes their first base situation a little less desperate…but I am surprised at how they’ve switched places with the Reds. I promise, I’m not making any deliberate moves to hold the Reds back, but they keep on slipping.
NL West: was LA 88 SF 85 SD 83 Ari 78 Col 71 89 85 81 78 72
Not much change here, with the most notable one being the Padres’ loss of Luebke for the season. I don’t see Arroyo doing much but adding depth – he’s no better than the mostly Randall Delgado innings he replaces – and ditto for Maholm and the Dodgers.
Davis’ component-power score is only projected to be a +35 (he says “only”), after putting up a surprising 50 last year.
His three-year line for power going into 2014 is 29, 27, 50. I did a quick search for players who
1) had 250 PA each across a four season span
2) were 26-30 in the fourth season (Davis will be 28 this year);
3) averaged at least a +15 power in the first two years
4) was at least 15 runs better than each of the first two years
That gave us this list:
yr4 age4 pow1 pow2 pow3 pow4 pow4-3 Jack Cust 2008 29 9 26 43 41 -2 Juan Diaz 2001 27 20 12 39 31 -8 Jim Gentile 1962 28 19 26 51 25 -26 Willie Horton 1969 26 22 26 41 21 -20 Todd Hundley 1997 28 19 20 43 36 -7 Adam LaRoche 2007 27 17 15 33 14 -19 Joey Meyer 1988 26 22 21 43 17 -26 Jai Miller 2012 27 20 19 38 16 -22 Kevin Mitchell 1990 28 16 18 54 32 -22 Mike Napoli 2009 27 22 15 39 19 -20 Dave Nicholson 1969 29 19 17 37 20 -17 Gene Oliver 1962 27 20 14 35 10 -25 David Ortiz 2004 28 17 17 32 34 2 Carlos Pena 2008 30 24 14 52 31 -21 Mark Reynolds 2010 26 20 22 39 34 -5 Tony Solaita 1976 29 22 19 54 15 -39 Gorman Thomas 1979 28 22 28 48 55 7 Jason Thompson 1982 27 12 21 42 30 -12 Jim Wynn 1968 26 15 16 35 29 -6
Only 2 out of 19 players (David Ortiz and Gorman Thomas) managed to up their power score yet again in the fourth year. The mean change from year3 is -15; the media change is -20. After averaging 42 in the boom year, they averaged a (still repectable, and still better than the first two years) 27 in the fourth year.
Show your projections from last year.
On the Projections page, there are links to the 2012 and 2013. They are from the saved spreadsheets that I have from the dates given, and run through the same csv-to-webpage script I used to make the current pages.
To help people understand how the #1 team is forecasted to “average” 91 wins, can you also show the averages for #1 through #30? That is, take the highest win total for each of your simulations (regardless of team), and show us that average. Then do the same for the second highest and so on.
He has no team winning more than 91 games… very likely.. lol
Tom Sheffield says:
It’s still way too early for projections like this but I do find great fault with 91 wins being the best record in baseball this year. The AL East looks about right standings wise.
There’s an issue here that I find hard to explain.
It is almost certainly NOT the case that the best record in baseball will only amount to 91 wins. In fact, if you looked at the playoff chances page, you’ll see that the AL East says this
Average wins by position in AL East: 95.2 87.7 81.9 76.1 68.5
indicating that it will take 95 wins, on average, to win the division – even though no team in the division, on average, gets above 90. Every division, in fact, takes 94-95 wins to finish first. WTF? Teams don’t win _on average_. The winning team will be the one who combines a good projection AND beats their projection. If the past three years are any indication, the average team is going to be 5 games off these projections – and a couple of teams will miss by 20. In the odds page, I play the season out a million times. In the real world, it will only play once, and how you perform relative to your projection determines your final standing.
There is no doubt in my mind that the best teams will be better than their projection, and the worst teams will be worse. Last year, the six first place teams averaged 8.7 wins better than their projection. Only the Tigers were able to underperform their projection and still win their division.
The six second place teams were +6.5.
The third place teams averaged -0.2…basically zero. Just meeting your projection is a recipe for mediocrity.
The fourth place teams averaged -3.
The last place teams averaged -10.
Whether the projection error comes from mis-estimating the real quality, or just random luck, or a mid-season tradeoff of talent from the weak to the strong that exaggerates the difference…there will be errors, and they have as much to do with deciding the winners as real talent. I’m sorry if that sounds like a copout.
David Lowe says:
You might want to tweak your software. The Royals aren’t going to be 9 games worse than they were last year, bro.
Insane! How does the computer project the Royals to get worse??? With that defense and relief corps? No way…
Any projection is going to upset fans of various teams, especially if the projection comes in lower than they think is deserved.
With the Royals, the big concern for me is the pitching. I expect Shields to come back about a half run in ERA, and I don’t see quality replacements for Santana and Chen, who surprisingly put up over 400 IP @ 3.50 ERA. Two things I will concede – there is some evidence, looking at the last two years of projections, that I under-count defense…or rather, that teams with good(bad) defense don’t get their runs allowed moved down(up) enough. The Royals and Orioles are two teams who might be suffering from that bias…if it is real. It didn’t show up in the 2011 data with nearly the same effect as in 2012-13.
Now, Guthrie at a 5.00-ish ERA. I’m perfectly comfortable with that projection. He was 20 runs above average in the DR component – my way of saying he gave up 20 runs less than expected, base don his other stats. He doesn’t have a history of putting up that kind of number, and even if he did, that component score heavily, heavily trends towards zero in future years. The issue I have with the projection, in retrospect, is that there’s no way he gets 30 starts with that levelof performance. Its not as though there’s a ton of depth there, though, so its not going to make a big difference, but future iterations are liable to come up a a couple of wins for them. It IS a process to run these stats, and this was just an opener.
Sorry, but if you think the Reds will be under .500 your computer has a bad virus.
I predict that Cincinnati fans will become thoroughly sick of the phrase “you can’t steal first base” this season.
My first run (that I’m willing to talk about) of projections for the coming season is now up on the 2014 Projected Standings tab. They have also been used to create a new Playoff Chances Report. And, of course, the individual projections that go into are available, again on the Projected Standings page.
|East||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
|Central||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
|West||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
|East||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
|Central||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
|West||Won||Lost||Runs||Runs A||Champ||Wild Card||Net Playoff|
To build these projections, I:
1) Run a computerized projection scheme, using the last three years of player performance compared against a database of all players’ four year performances. The algorithm attempts to find the most similar players, in terms of age, position, build, and performance, and the top 20 players are noted on the individual player cards.
2) Take those performances, and enter them into a very large spreadsheet, where I fill in expected playing times for all of the players. Every team, every position has to equal 100%. There have to be 162 pitching starts. Generally speaking, a) no position player gets more than 90%, and pitchers are mostly capped at 32 starts; b) rookie starters don’t get more than 80%; c) players I don’t think can hold the job all year certainly get less; d) the playing time estimates from the computer tend to carry a lot of weight. I normally set a sure starter to the 5% playing time level that first passes their projected PA, while innings are usually held under the computer’s values.
All of the statistics in the spreadsheet get rebalanced and weighted. Players on teams with high OBAs will get more plate appearances. Defense trickles back into pitchers hits (and runs) allowed. The league as a whole has to come out equal to the league totals of last year.
Current free agents won’t show up here – no team, no projected playing time. Their projections are still available on the “All hitters” and “All pitchers” downloads.
Getting to some of the players takes a deep depth chart. I’ve prepared some that you can find under the 2014 Spring tab, under “dts”. Every team has three files in there. One is a dt file, which contains the translated statistics, 2009-13, with the computer-only 2014 projection, for all hitters in that team’s system; another is a pdt file, which does the same for pitchers. The “orgdt” file just has the 2014 projections for all players on the team, sorted by position and projected WARP, like the one here for the Nationals. Kind of works as a very deep depth chart for all teams, although I can’t swear that aren’t players showing up on the wrong team (especially for players who have been released – there’s a decent chance they still show up for their old teams). That’s just for these depth charts – I am reasonably certain that every player used in the major league projections is actually a member of their team. The one exception might be Matt Garza, who I have already written into the Milwaukee rotation.
Looking back on the Hall of Fame issues that came up, I think quite a few of the problems would disappear if they would just have a real election.
What, you say they already have one? No, they do not. Maybe I’m being overly pedantic, but an election, to me, is a way of choosing people to fill a position that must be filled. In particular, it has to result in a winner. The Hall of Fame selection process does not ensure a winner; it is more akin to the process of passing a piece of legislation than to the process of selecting a legislator.
The Baseball Hall of Fame has a pretty basic conflict. The Hall itself – and the community that founded it – desires, and needs, to have induction ceremonies held every July, and induction ceremonies without inductees is just bad for business. This argues for making voting easier, to ensure that we don’t have another repeat of 2013, when no one was selected.
On the other hand, they have given the keys of election to a group – the BBWAA – which seemingly takes more pride in denying entrance to the unworthy than welcoming the worthy. The procedures they have adopted also are intended to exclude all but the best.
Looking at things from a large, historical perspective, we see that major league baseball recognizes 2425 team-seasons in major league history – 1256 in the NL, 1048 in the AL, 85 in the 19th century American Association, 16 in the Federal League, 12 in the Union Association, and 8 in the Player’s League. Personally, I’d include all the teams in the National Association of 1871-75 as well, which would bump us up another 50, getting us to 2475.
There have also been 211 players elected to the Hall of Fame – not counting managers and Negro League players. I’d also include a few players from the NA days who were inducted as “pioneers”, but whose playing career demonstrates at least some worthiness (George Wright and Al Spalding for sure; Candy Cummings is more questionable). I’d also add to the list of players some obvious selections (based on their play) who have been denied entrance for moral failings of one kind or another – let us say Joe Jackson, Pete Rose, Mark McGwire, Barry Bonds, Sammy Sosa, and Roger Clemens. That is 220 players, 2475 teams, or a player for every 11.25 teams in history.
That was the most expansive definition. If I wanted to be stricter, I could just look at the 211 players selected to the Hall. And I could throw out the NA teams, and all the third leagues, and probably the first three years of the AA, when it’s quality level was way, way below the NL of the day. That produces a narrower list of 2358 teams. Ratios vary from 10.72 (using the largest number of players and smallest number of teams) to 11.73 (the reverse). To be less precise – there’s been a Hall of Fame player selected for every 11 or 12 teams in history.
Since there are currently 30 teams playing in the majors every year, it means that if you simply accepted the existing ratio as a guide, then we should be creating around 2.5 new Hall of Famers every year just to keep up.
So my proposal to the Hall of Fame committee is this – make it a real election. The top vote getter each year gets in, regardless of the vote count. The second-place finisher gets in, assuming a 50%+1 approval. The third (or more) person goes in if they can pull a 75% approval.
In all of Hall voting, there have only been two players who have finished first or second in the voting without currently being in the Hall of Fame – Craig Biggio (1st in 2013, and near-certain to crack the threshold at some later date) and Jack Morris (who finished 2nd in 2013). Even in third, there’s only a few cases – Jeff Bagwell in 2012-13, Tony Oliva in 1988, and Gil Hodges four times in the 70s. I don’t think the Hall would be in any way diminished by these inclusions.
How would the last 25 years elections have worked following my rules? I’m going to make the naive assumption that votes for other players would not have changed due to players that I’ve removed from the ballot by inducting them before their time.
1994 – The real Hall tabs Steve Carlton, and we concur. But we will also honor Orlando Cepeda, who picked up 73.5% while finishing second, and won’t make him wait until a Veterans Committee meeting in 1999.
1996 – No one is elected by the real Hall. Niekro was first, so we skip him; that makes Tony Perez #1, so in he goes without waiting four more years. Sutton is next, skip him, and that brings up Steve Garvey…but he only has 37% vote. Perez is our only inductee this year.
1997 – The real Hall chose Niekro, followed by Sutton and Perez, all of whom we’ve already honored. The top recipient, and our winner, even though he only had 39% of the vote, is Ron Santo. We salute him in 1997, instead of making him wait until the afterlife (died 2010, inducted to Hall in 2012).
1998 – The Hall selected Don Sutton. We skip him, and then skip Perez, and Santo, and then its welcome to the Hall of Fame, Jim Rice. We’re already under 50%, so he’s all alone, but he doesn’t have to wait another decade until 2009.
2000 – Carlton Fisk is selected by the Hall, and we’re fine with that. Perez and Rice were next, and we already have them; our second place finisher is Gary Carter, but he is just under 50% and so will have to wait.
2002 – Ozzie Smith is really elected. Gary Carter is second, and now has over 50% of the vote, so he gets in a year earlier than reality.
2003 – Eddie Murray finished first, and was genuinely elected, and Carter was also elected. Since we already have Carter in, our second-place finisher is Bruce Sutter, who qualifies with 54% approval. In three years early.
2006 – Sutter was the only real inductee that year. Ignoring him, and second-place finisher Rice, our top recipient is Rich Gossage. And our second place finisher is Andre Dawson, and 61% makes him a qualifying second-placer. Gossage goes in for us now instead of 2008, and Dawson moves up from 2010.
2008 – Gossage was the Hall’s real choice. We’re going to go past him, and Rice, and Dawson, and find ourselves a nice shiny Bert Blyleven. The next finisher would be Lee Smith, but he’s under 50%; so Bert has the podium to himself now instead of waiting until 2011.
2009 – Rickey Henderson is taken in reality, as was Jim Rice. Our second place finisher (after skipping Rice, Dawson, and Blyleven) would again be Lee Smith, but again he’s under 50% and is not inducted.
2010 – Reality elects Dawson, but we’ve had him in for four years already. Next was Blyleven, also in already. Our top finisher in 2010 is Roberto Alomar, so he goes in a year ahead of time. Our second place finisher is Jack Morris, and he does receive 50% of the vote, so he goes in, too. Morris is the first person we’ve inducted who has not made the actual Hall. However, like Cepeda and Santo, similarly rejected by the BBWAA, he’s a near-cinch for a future Veteran’s Committee.
2011 – Reality selected Alomar and Blyleven, but we have beaten reality to the punch. Barry Larkin is our inductee. Morris would have gotten in again, but skipping over him means that, for the third time, our second place finisher is Lee Smith. And for the third time, he is under 50%.
2012 – The Hall really chose Larkin; we’ll ignore him, and then ignore Morris. Our number 1 becomes Jeff Bagwell. Our number two, again, is Lee Smith; but this time he picked up 50.6% of the vote. He’s in!
2013 – No one was selected by the BBWAA this year. Craig Biggio was on top the list, though, so he is in. We can ignore Morris, and Bagwell in third, to get down to Mike Piazza. He’s our second-place man, and he’s got 58% support, so in he goes.
So to summarize – this way guarantees that there will be someone to honor at Cooperstown each year. Players who aren’t selected in their first year tend to get in a couple of years earlier this way. Virtually all players who meet our rules but not the BBWAA 75% rule eventually get named to the Hall anyway. We’d have saved Orlando Cepeda and Ron Santo from the Veteran’s Committee. We would have inducted Jack Morris and Lee Smith, who have (definitely, probably) missed out from the BBWAA. We’ve already gotten to Bagwell, Biggio, and Piazza, who should all be eventual winners.
While everything on this site is free, a donation through Paypal to help offset costs would be greatly appreciated. -Clay