Optimizing Initialization, Feature Selection, and Tensor Dimension Reduction in Unsupervised Learning: Methods and Applications
Unsupervised machine learning (ML) is essential for analyzing complex data without labels. Many challenges have been identified. This dissertation addresses three key challenges: clustering initialization, unsupervised feature selection, and dimension reduction for tensors. The thesis also applies unsupervised ML to the airborne LiDAR data.
Chapter 2 introduces an improved initialization strategy for K-Means clustering and Gaussian Mixture Models (GMM). The proposed method improves clustering stability and accuracy.
Chapter 3 develops a stepwise unsupervised feature selection framework, called the Forward Partial-Variable Clustering with Full-Variable Loss (FPCFL), to improve clustering performance in high-dimensional data.
Chapter 4 focuses on tensor dimension reduction and feature selection in multiway data. It introduces Low-Rank Sparse Tensor Approximation (LRSTA) for efficient data compression and High-Order Orthogonal Decomposition (HOOD) for improved sparsity and interpretability, particularly in large-scale datasets like image and video analysis.
Chapter 5 explores unsupervised ML in airborne LiDAR data, applying clustering and dimensionality reduction to enhance ground filtering and object detection in 3D point clouds.
This dissertation advances unsupervised ML by improving clustering reliability, optimizing feature selection, and enhancing tensor decomposition, contributing to more effective and scalable data-driven analysis.
History
Degree Type
- Doctor of Philosophy
Department
- Statistics
Campus location
- West Lafayette