Stat Chat

It’s that time of the year again. The days are getting longer and warmer, leaves are growing and flowers are soon to bud. However, the event of paramount significance is the arrival of Major League Baseball players in Arizona and Florida for the spring training. Statisticians often decry the amount of emphasis fans place on preseason performance—spring training rosters give managers a chance to see if their younger players are ready to receive a big-league nod or if a free-agent flier is worth hanging onto by swelling to more than twice the size of regular season rosters. These younger players and free-agent fringe guys are often playing at a level far below that of the average major leaguer. What’s more, the regulars are not playing much (leading to a small sample size), and when they do, they are notorious for trying new swings (often a swing and miss) and other mechanical adjustments. This begs the question: just how predictive of regular season performance are spring training stats for hitters?

I attempted to answer this question by gathering information on 109 of the players who had at least 50 spring training plate appearances and at least 350 regular-season plate appearances in 2019 (Baseball-Reference, “2019 Major League Baseball Season Summary,” 02.23.2020). The stats I considered the most important for this analysis were on-base percentage, slugging percentage, on-base plus slugging percentage (OPS), OPS+, isolated power (ISO), walk rate and strikeout rate. Slugging percentage is the total number of bases (one for a single, two for a double, etc.) a batter gets divided by their total at-bats. OPS+ is OPS relative to the league average OPS— an OPS+ of 100 indicates exactly league average. Isolated power is slugging minus average, so it measures a batter’s rate of extra-base hits (hits that aren’t singles). I only included batting average for the traditionalists out there—it proved to be the least predictable statistic besides stolen base success rate.

For comparison’s sake, I also compiled another dataset for all 162 hitters who had 350 regular season plate appearances in both 2018 and 2019 (FanGraphs Baseball, “Major League Leaderboards,” 02.23.2020). I used a statistic (wRC+) that similarly compares performance to league average instead of OPS+ for this dataset; wRC+ is more robust, but was not available in the spring training dataset. The other statistics measured the same things across datasets. Is previous regular season performance that much more predictive than spring training performance? Let’s find out.

2018 regular season stats are better at predicting 2019 regular season stats pretty much across the board. The only instance where 2019 spring training proves better is in stolen base success rate, but neither of the two predict 2019 stolen base success rate significantly; both p-values are high, indicating that the predictability is probably due to chance. Otherwise, 2018 stats explain a larger percentage of the variation in 2019 stats (measured by R2) and are less likely to be due to chance (measured by the p-values).

Nevertheless, this analysis by no means concludes that spring training stats are useless. Every stat besides batting average and stolen base success rate has a p-value below 0.10 for predicting in-season stats. The p-value here attempts to answer the question, “If there were no relationship between spring training stats and in-season stats, what is the probability we would get the results we did by chance?” In this case, if there were no relationship, we would only find a phantom relationship 10 percent of the time, suggesting that there probably is a relationship. Strikeout and walk rates especially are pretty reliable statistics even with a sample size as small as spring training.

Nevertheless, this analysis by no means concludes that spring training stats are useless. Every stat besides batting average and stolen base success rate has a p-value below 0.10 for predicting in-season stats. The p-value here attempts to answer the question, “If there were no relationship between spring training stats and in-season stats, what is the probability we would get the results we did by chance?” In this case, if there were no relationship, we would only find a phantom relationship 10 percent of the time, suggesting that there probably is a relationship. Strikeout and walk rates especially are pretty reliable statistics even with a sample size as small as spring training.

That being said, the predictive power of the 2018 stats is clearly much stronger. For as reliable as strikeout and walk rates are even with a small sample size, these rates from previous seasons are significantly more predictive than their spring training counterparts. By R2, each is at least twice as good (several are more than five times as good) as spring training stats are at explaining the variation in 2019 stats. So, besides those who live under a rock in Port St. Lucie, who would ever use spring training stats if they have last season’s data at their disposal?


I compiled a list of 70 hitters with at least 50 spring training plate appearances and 350 plate appearances in both 2018 and 2019. I made a model predicting 2019 strikeout rate (since it was the most reliable statistic) based on strikeout rates in 2018 and spring training 2019. Both proved significant in predicting the 2019 stat, with p-values of less than 0.001 and 0.008, respectively. The model spat out the following formula to predict 2019 strikeout rate:

2019rate = 3.60610 + 0.68369(2018rate) + 0.16039(SpringRate)

The 2018 rate is more significant, since it has a higher multiplier (called a “coefficient”). But, if you have a really outlandishly low or high spring strikeout rate, it could end up affecting your expected 2019 rate more than the 2018 rate affects it. In this case, spring training should not be dismissed as merely an experimental time, but rather, it should be considered as additional data to use to predict in-season stats as long as you weigh it against the previous season’s stats accordingly. For those who did not play in the previous season such as the minor leaguers and oft-injured has-beens being given auditions in spring training, their spring training stats could prove all the more useful in predicting their performance in the upcoming season if they stick on the roster. If they don’t stick on the roster and end up in the minor leagues after spring training, their spring training statistics could prove useful in predicting what their performance would look like upon being called up, possibly even more useful than their minor league statistics.

Leave a Reply

Your email address will not be published. Required fields are marked *