Purdue University Graduate School
Browse

Statistical Methods for Small Sample Cognitive Diagnosis

Download (883.83 kB)
thesis
posted on 2024-04-19, 18:28 authored by David B ArthurDavid B Arthur

It has been shown that formative assessments can lead to improvements in the learning process. Cognitive Diagnostic Models (CDMs) are a powerful formative assessment tool that can be used to provide individuals with valuable information regarding skill mastery in educational settings. These models provide each student with a ``skill mastery profile'' that shows the level of mastery they have obtained with regard to a specific set of skills. These profiles can be used to help both students and educators make more informed decisions regarding the educational process, which can in turn accelerate learning for students. However, despite their utility, these models are rarely used with small sample sizes. One reason for this is that these models are often complex, containing many parameters that can be difficult to estimate accurately when working with a small number of observations. This work aims to contribute to and expand upon previous work to make CDMs more accessible for a wider range of educators and students.

There are three main small sample statistical problems that we address in this work: 1) accurate estimation of the population distribution of skill mastery profiles, 2) accurate estimation of additional model parameters for CDMs as well as improved classification of individual skill mastery profiles, and 3) improved selection of an appropriate CDM for each item on the assessment. Each of these problems deals with a different aspect of educational measurement and the solutions provided to these problems can ultimately lead to improvements in the educational process for both students and teachers. By finding solutions to these problems that work well when using small sample sizes, we make it possible to improve learning in everyday classroom settings and not just in large scale assessment settings.

In the first part of this work, we propose novel algorithms for estimating the population distribution of skill mastery profiles for a popular CDM, the Deterministic Inputs Noisy ``and'' Gate (DINA) model. These algorithms borrow inspiration from the concepts behind popular machine learning algorithms. However, in contrast to these methods, which are often used solely for prediction, we illustrate how the ideas behind these methods can be adapted to obtain estimates of specific model parameters. Through studies involving simulated and real-life data, we illustrate how the proposed algorithms can be used to gain a better picture of the distribution of skill mastery profiles for an entire population students, but can do so by only using a small sample of students from that population.

In the second part of this work, we introduce a new method for regularizing high-dimensional CDMs using a class of Bayesian shrinkage priors known as catalytic priors. We show how a simpler model can first be fit to the observed data and then be used to generate additional pseudo-observations that, when combined with the original observations, make it easier to more accurately estimate the parameters in a complex model of interest. We propose an alternative, simpler model that can be used instead of the DINA model and show how the information from this model can be used to formulate an intuitive shrinkage prior that effectively regularizes model parameters. This makes it possible to improve the accuracy of parameter estimates for the more complex model, which in turn leads to better classification of skill mastery. We demonstrate the utility of this method in studies involving simulated and real-life data and show how the proposed approach is superior to other common approaches for small sample estimation of CDMs.

Finally, we discuss the important problem of selecting the most appropriate model for each item on assessment. Often, it is not uncommon in practice to use the same CDM for each item on an assessment. However, this can lead to suboptimal results in terms of parameter estimation and overall model fit. Current methods for item-level model selection rely on large sample asymptotic theory and are thus inappropriate when the sample size is small. We propose a Bayesian approach for performing item-level model selection using Reversible Jump Markov chain Monte Carlo. This approach allows for the simultaneous estimation of posterior probabilities and model parameters for each candidate model and does not require a large sample size to be valid. We again demonstrate through studies involving simulated and real-life data that the proposed approach leads to a much higher chance of selecting the best model for each item. This in turn leads to better estimates of item and other model parameters, which ultimately leads to more accurate information regarding skill mastery.

History

Degree Type

  • Doctor of Philosophy

Department

  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Arman Sabbaghi

Advisor/Supervisor/Committee co-chair

Hua Hua Chang

Additional Committee Member 2

Vinayak A.P. Rao

Additional Committee Member 3

Xiao Wang

Additional Committee Member 4

Gongjun Xu

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC