EFFICIENT INFERENCE AND DOMINANT-SET BASED CLUSTERING FOR FUNCTIONAL DATA

Wang, Xiang

doi:10.25394/PGS.25617777.v1

EFFICIENT INFERENCE AND DOMINANT-SET BASED CLUSTERING FOR FUNCTIONAL DATA

thesis

posted on 2024-06-03, 18:52 authored by Xiang Wang

This dissertation addresses three progressively fundamental problems for functional data analysis: (1) To do efficient inference for the functional mean model accounting for within-subject correlation, we propose the refined and bias-corrected empirical likelihood method. (2) To identify functional subjects potentially from different populations, we propose the dominant-set based unsupervised clustering method using the similarity matrix. (3) To learn the similarity matrix from various similarity metrics for functional data clustering, we propose the modularity guided and dominant-set based semi-supervised clustering method.

In the first problem, the empirical likelihood method is utilized to do inference for the mean function of functional data by constructing the refined and bias-corrected estimating equation. The proposed estimating equation not only improves efficiency but also enables practically feasible empirical likelihood inference by properly incorporating within-subject correlation, which has not been achieved by previous studies.

In the second problem, the dominant-set based unsupervised clustering method is proposed to maximize the within-cluster similarity and applied to functional data with a flexible choice of similarity measures between curves. The proposed unsupervised clustering method is a hierarchical bipartition procedure under the penalized optimization framework with the tuning parameter selected by maximizing the clustering criterion called modularity of the resulting two clusters, which is inspired by the concept of dominant set in graph theory and solved by replicator dynamics in game theory. The advantage offered by this approach is not only robust to imbalanced sizes of groups but also to outliers, which overcomes the limitation of many existing clustering methods.

In the third problem, the metric-based semi-supervised clustering method is proposed with similarity metric learned by modularity maximization and followed by the above proposed dominant-set based clustering procedure. Under semi-supervised setting where some clustering memberships are known, the goal is to determine the best linear combination of candidate similarity metrics as the final metric to enhance the clustering performance. Besides the global metric-based algorithm, another algorithm is also proposed to learn individual metrics for each cluster, which permits overlapping membership for the clustering. This is innovatively different from many existing methods. This method is superiorly applicable to functional data with various similarity metrics between functional curves, while also exhibiting robustness to imbalanced sizes of groups, which are intrinsic to the dominant-set based clustering approach.

In all three problems, the advantages of the proposed methods are demonstrated through extensive empirical investigations using simulations as well as real data applications.

Funding

Research supported in part by NSF awards DMS-2212928 (2024).

History

Degree Type

Doctor of Philosophy

Department

Mathematics

Campus location

Indianapolis

Advisor/Supervisor/Committee Chair

Honglang Wang

Additional Committee Member 2

Benzion Boukai

Additional Committee Member 3

Fei Tan

Additional Committee Member 4

Hanxiang Peng

Usage metrics

Keywords

Clustering Dominant set Efficiency Empirical likelihood Functional / longitudinal data Kernel smoothing Metric learning Modularity Replicator dynamics Semi-supervised clustering Similarity Within-subject correlation

Licence

CC BY-ND 4.0

EFFICIENT INFERENCE AND DOMINANT-SET BASED CLUSTERING FOR FUNCTIONAL DATA

Funding

Research supported in part by NSF awards DMS-2212928 (2024).

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports