VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS

Zhao, Junhan

doi:10.25394/PGS.14843856.v1

phd_dissertation_new_format_DEPOSIT.pdf (21.76 MB)

VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS

thesis

posted on 2022-06-07, 16:00 authored by Junhan ZhaoJunhan Zhao

Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.

Funding

Bilsland Fellowship

History

Degree Type

Doctor of Philosophy

Department

Computer Graphics Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Yingjie Chen

Additional Committee Member 2

James Mohler

Additional Committee Member 3

David Ebert

Additional Committee Member 4

Baijian Yang

Usage metrics

Keywords

data visualization methods dimensionality reduction analysis Uncertainty Unsupervised Learning Computer-Human Interaction Computer Graphics

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS

Funding

Bilsland Fellowship

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports