Purdue University Graduate School
HanxiSun_Thesis.pdf (4.79 MB)

Nonparametric Bayesian Clustering under Structural Restrictions

Download (4.79 MB)
posted on 2021-07-23, 15:48 authored by Hanxi SunHanxi Sun
Model-based clustering, with its flexibility and solid statistical foundations, is an important tool for unsupervised learning, and has numerous applications in a variety of fields. This dissertation focuses on nonparametric Bayesian approaches to model-based clustering under structural restrictions. These are additional constraints on the model that embody prior knowledge, either to regularize the model structure to encourage interpretability and parsimony or to encourage statistical sharing through underlying tree or network structure.

The first part in the dissertation focuses on the most commonly used model-based clustering models, mixture models. Current approaches typically model the parameters of the mixture components as independent variables, which can lead to overfitting that produces poorly separated clusters, and can also be sensitive to model misspecification. To address this problem, we propose a novel Bayesian mixture model with the structural restriction being that the clusters repel each other.The repulsion is induced by the generalized Matérn type-III repulsive point process. We derive an efficient Markov chain Monte Carlo (MCMC) algorithm for posterior inference, and demonstrate its utility on a number of synthetic and real-world problems.

The second part of the dissertation focuses on clustering populations with a hierarchical dependency structure that can be described by a tree. A classic example of such problems, which is also the focus of our work, is the phylogenetic tree with nodes often representing biological species. The structure of this problem refers to the hierarchical structure of the populations. Clustering of the populations in this problem is equivalent to identify branches in the tree where the populations at the parent and child node have significantly different distributions. We construct a nonparametric Bayesian model based on hierarchical Pitman-Yor and Poisson processes to exploit this, and develop an efficient particle MCMC algorithm to address this problem. We illustrate the efficacy of our proposed approach on both synthetic and real-world problems.





Degree Type

  • Doctor of Philosophy


  • Statistics

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Vinayak Rao

Additional Committee Member 2

Heejung Shim

Additional Committee Member 3

Hao Zhang

Additional Committee Member 4

Xiao Wang

Additional Committee Member 5

Qifan Song

Usage metrics



    Ref. manager