1500 Students and Only a Single Cluster? A Multimethod Clustering Analysis of Assessment Data from a Large, Structured Engineering Course
Clustering, a prevalent class of machine learning (ML) algorithms used in data mining and pattern-finding—has increasingly helped engineering education researchers and educators see and understand assessment patterns at scale. However, a challenge remains to make ML-enabled educational inferences that are useful and reliable for research or instruction, especially if those inferences influence pedagogical decisions or student outcomes. ML offers an opportunity to better personalizing learners’ experiences using those inferences, even within large engineering classrooms. However, neglecting to verify the trustworthiness of ML-derived inferences can have wide-ranging negative impacts on the lives of learners.
This study investigated what student clusters exist within the standard operational data of a large first-year engineering course (>1500 students). This course focuses on computational thinking skills for engineering design. The clustering data set included approximately 500,000 assessment data points using a consistent five-scale criterion-based grading framework. Two clustering techniques—N-TARP profiling and K-means clustering—examined criterion-based assessment data and identified student cluster sets. N-TARP profiling is an expansion of the N-TARP binary clustering method. N-TARP is well suited to this course’s assessment data because of the large and potentially high-dimensional nature of the data set. K-means clustering is one of the oldest and most widely used clustering methods in educational research, making it a good candidate for comparison. After finding clusters, their interpretability and trustworthiness were determined. The following research questions provided the structure for this study: RQ1 – What student clusters do N-TARP profiling and K-means clustering identify when applied to structured assessment data from a large engineering course? RQ2 – What are the characteristics of an average student in each cluster? and How well does the average student in each cluster represent the students of that cluster? And RQ3 – What are the strengths and limitations of using N-TARP and K-means clustering techniques with large, highly structured engineering course assessment data?
Although both K-means clustering and N-TARP profiling did identify potential student clusters, the clusters of neither method were verifiable or replicable. Such dubious results suggest that a better interpretation is that all student performance data from this course exist in a single homogeneous cluster. This study further demonstrated the utility and precision of N-TARP’s warning that the clustering results within this educational data set were not trustworthy (by using its W value). Providing this warning is rare among the thousands of available clustering methods; most clustering methods (including K-means) will return clusters regardless. When a clustering algorithm identifies false clusters that lack meaningful separation or differences, incorrect or harmful educational inferences can result.
- Doctor of Philosophy
- Engineering Education
- West Lafayette