PREDICTORS OF EARLY POSTSECONDARY STEM PERSISTENCE OF HIGH-ACHIEVING STUDENTS: AN EXPLANATORY STUDY USING MACHINE LEARNING TECHNIQUES
This study investigated high-achieving and non-high-achieving students’ persistence in STEM fields using nationally representative data from the High School Longitudinal Study of 2009 for the years 2009, 2012, 2013, 2013-2014, and 2016. The results indicated that approximately 70% of high-achieving and non-high-achieving students continued their initial STEM degrees within 3 years of college enrollment. The study revealed that the most important predictors of STEM persistence were: math proficiency level, school belonging, school engagement, school motivation, school problems, science self-efficacy, credits earned in computer sciences, GPA in STEM courses, credits earned in STEM courses, and credits earned in Advanced Placement/International Baccalaureate (AP/IB) courses. Based on the results, math proficiency was the most important variable in the study for both high-achieving and non-high-achieving students. Even though credits earned in AP/IB combined were among the most important variables, they were two times more important for high-achieving students (6.86% vs. 3.37%). Regarding demographic information related variables, socioeconomic status was the most important variable among gender, ethnicity, and urbanicity in models predicting STEM persistence and had higher importance for non-high-achieving students. Furthermore, Hispanic students' proportion of persistence differed from other underrepresented populations’ persistence. Non-high-achieving Hispanic students had the highest persistence rate, similar to well-represented populations (i.e., White, Asian). Machine learning methods used in the study including random forest and artificial neural network provided good accuracy for both achievement groups. Random forest accuracy was over 82% with the Synthetic Minority Over-Sampling Technique (SMOTE) dataset, while artificial neural network accuracy was over 92%.