The full paper
Reason: There are two paper contents included in this dissertation. One is about to submit for reviewing and the other is under editing. Therefore, we prefer not to publish in recent.
until file(s) become available
VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS
thesisposted on 2022-06-07, 16:00 authored by Junhan ZhaoJunhan Zhao
Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.