Prediction of player performance is a key component in the construction of baseball team rosters. Traditionally, the problem of predicting seasonal plate appearance outcomes has been approached univariately. That is, focusing on each outcome separately rather than jointly modeling the collection of outcomes. More recently, there has been a greater emphasis on joint modeling, thereby accounting for the correlations between outcomes. However, most of these state of the art prediction models are the proprietary property of teams or industrial sports entities and so little is available in open publications.
This dissertation introduces a joint modeling approach to predict seasonal plate appearance outcome vectors using a mixed-effects multinomial logistic-normal model. This model accounts for positive and negative correlations between outcomes both across and within player seasons. It is also applied to the important, yet unaddressed, problem of predicting performance for players moving between the Japanese and American major leagues.
This work begins by motivating the methodological choices through a comparison of state of the art procedures followed by a detailed description of the modeling and estimation approach that includes model t assessments. We then apply the method to longitudinal multinomial count data of baseball player-seasons for players moving between the Japanese and American major leagues and discuss the results. Extensions of this modeling framework to other similar data structures are also discussed.