Purdue University Graduate School
2022.04.14 Aparajita_Jaiswal.pdf (1.92 MB)

Characterizing the learning, sociology, and identity effects of participating in The Data Mine

Download (1.92 MB)
posted on 2022-04-20, 18:03 authored by Aparajita JaiswalAparajita Jaiswal

The discipline of data science has gained substantial attention recently. This is mainly attributed to the technological advancement that led to an exponential increase in computing power and has made the generation and recording of enormous amounts of data possible on an everyday basis. It has become crucial for industries to wrangle, curate, and analyze data using data science techniques to make informed decisions. Making informed decisions is complex. Therefore, a trained data science workforce is required to analyze data on a real-time basis. The increasing demand for data science professionals has caused higher education institutions to develop courses and train students starting from the undergraduate level about the data science concepts and tools.

Despite the efforts from the institutions and national agency such as National Academies of Sciences, Engineering, and Medicine, it has been witnessed that there have been significant challenges in retaining and attracting students in the discipline of data science. The novice learners in data science are required to possess the skills of a programmer, a statistician, research skills, and non-technical skills such as communication and critical thinking. The undergraduate students do not possess all the required skills, which, in turn, creates a cognitive load for novice learners (Koby & Orit, 2020). Research suggests that improving the teaching and mentoring methodologies can improve retention for students from all demographic groups (Seymour, 2002). Previous studies (e.g., Hoffmann et al., 2002, Flynn, 2015; Lenning & Ebbers, 1999) have revealed that learning communities are effective in improving student retention, especially at the undergraduate level, as it helps students develop a sense of belonging, socialize, and form their own identities. Learning communities have been identified as high impact practices (Kuh, 2008) that helps to develop identities and sense of belonging, however to the best of our knowledge there are few studies that focus on the development of the psychosocial and cognitive skills of the students enrolled in a data science learning community.

To meet the demand for the future workforce and help undergraduate students develop data science skills, The Data Mine (TDM) at Purdue University has undertaken an initiative in the discipline of data science. The Data Mine is an interdisciplinary living-learning community that allows students from various disciplines to enroll and learn data science skills under the guidance of competent faculty and corporate mentors. The residential nature of the learning community allows the undergraduate students to live, learn and socialize with peers of similar interests and develop a sense of belonging. The constant interaction with knowledgeable faculty and mentors in real-world projects allows novice learners to master data science skills and develop an identity. The study aims to characterize the effects of identity formation, socialization, and learning of the undergraduate students enrolled in The Data Mine and answer the following research question:

Quantitative: RQ 1: What are the perceptions of students regarding their identity formation, socialization opportunities, self-belief, and academic/intellectual development in The Data Mine? 

Qualitative: Guiding RQ 2: How do students’ participation in activities and interaction with peers, faculty, staff at The Data Mine contribute to becoming an experienced member of the learning community?

  • Sub-RQ 2(a): What are the perceived benefits and challenges of participating in The Data Mine?
  • Sub-RQ 2(b): How do students describe their levels of socialization and a sense of belonging within The Data Mine?
  • Sub-RQ 2(c): How do students’ participation and interaction in The Data Mine help them form their identity?

To approach the above research questions, we conducted a sequential explanatory mixed method study to understand the growth journey of students in terms of socialization, sense of belonging and identity formation. The data were collected in two phases: a quantitative survey study followed by qualitative semi-structured interviews. The quantitative data was analyzed using descriptive and inferential statistics, and qualitative data were analyzed using thematic analysis, followed by narrative analysis. The results of the quantitative and qualitative analysis demonstrated that learning in The Data Mine happened through interaction and socialization of the students with faculty, staff, and peers at The Data Mine. Students found multiple opportunities to learn and develop data science skills, such as working on real-world projects or working in groups. This continuous interaction with peers, faculty and staff at The Data Mine helped them to learn and develop identities. This study revealed that students did develop a data science identity, but the corporate partner TAs developed a leader identity along with the data science identity. In summary all students grew and served as mentor, guide, and role models for new incoming students.


Degree Type

  • Doctor of Philosophy


  • Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Alejandra J. Magana

Additional Committee Member 2

Austin L. Toombs

Additional Committee Member 3

Ida B. Ngambeki

Additional Committee Member 4

Mark D. Ward

Usage metrics



    Ref. manager