Purdue University Graduate School
Browse
phd_dissertation_new_format_DEPOSIT.pdf (21.76 MB)

VISUAL INTERPRETATION TO UNCERTAINTIES IN 2D EMBEDDING FROM PROBABILISTIC-BASED NON-LINEAR DIMENSIONALITY REDUCTION METHODS

Download (21.76 MB)
thesis
posted on 2022-06-07, 16:00 authored by Junhan ZhaoJunhan Zhao
Enabling human understanding of high-dimensional (HD) data is critical for scientific research but highly challenging. To deal with large datasets, probabilistic-based non-linear DR models, like UMAP and t-SNE, lead the performance on reducing the high dimensionality. However, considering the trade-off between global and local structure preservation and the randomness initialized for computation, applying non-linear models in different parameter settings to unknown high-dimensional structure data may return different 2D visual forms. Much critical neighborhood relationship may be falsely imposed, and uncertainty may be introduced into the low-dimensional embedding visualizations, so-called distortion. In this work, a survey has been conducted to illustrate the most state-of-the-art layout enrichment works for interpreting dimensionality reduction methods and results. Responding to the lack of visual interpretation techniques to probabilistic-based DR methods, we propose a visualization technique called ManiGraph, which facilitates users to explore multi-view 2D embeddings via mesoscopic structure graphs. A dynamic mesoscopic structure first subsets HD data by a hexagonal grid in visual space from non-linear embedding (e.g., UMAP). Then, it measures the regional adapted trustworthiness/continuity and visualizes the restored missing and highlighted false connections between subsets from high-dimensional space to the low-dimensional in a node-linkage manner. The visualization helps users understand and interpret the distortion from both visualization and model stages. We further demonstrate the user cases tested on intuitive 3D toy datasets, fashion-MNIST, and single-cell RNA sequencing with domain experts in unsupervised scenarios. This work will potentially benefit the data science community, from toolkit users to DR algorithm developers.

Funding

Bilsland Fellowship

History

Degree Type

  • Doctor of Philosophy

Department

  • Computer Graphics Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Yingjie Chen

Additional Committee Member 2

James Mohler

Additional Committee Member 3

David Ebert

Additional Committee Member 4

Baijian Yang

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC