Modeling/Predicting Home Runs in
Major League Baseball
Tyler Kelemen
What goes in to predicting a given players’ home run total for an upcoming season? How much of this can actually be modeled instead of being left up to chance/injury? I am most interested in determining which metrics best predict home run totals for 2017. You may have heard the phrase “pitcher’s league” being said within the last few years. This phrase carries the notion that baseball is becoming more of an offensive struggle with pitching now dominating America’s pastime. What if I told you that last year was the second most prolific home run hitting season in the history of baseball? In order to investigate further, I first looked at predicting 2016 home run totals with all 32 2015 metrics to get a better idea of what variables influence home runs. The only players I was interested in were those that were qualifying players in both the 2015 and 2016 seasons. A qualifying player is one whom has at least 502 plate appearances and there were 89 players whom qualified in both. From there, I fit forward and backward selection models which eventually culminated into fitting a decision tree using the variables remaining from the backward selection model. The prediction formula from the decision tree was then used on the 2016 metrics to predict 2017 home run totals for the 87 remaining players as two players retired.
| |