Dimensionality Reduction Considered Harmful (Some of the Time)
Hyeon Jeon
TL;DR
This work investigates how dimensionality reduction (DR) orderences in visual analytics can produce unreliable conclusions and proposes concrete remedies. It identifies three core reliability challenges—misuse of t-SNE/UMAP for inappropriate tasks, hyperparameter cherry-picking, and erroneous interactions—driven by distortion-prone projections and biased evaluation metrics. The contributions include (i) Label-Trustworthiness and Label-Continuity for label-based evaluation that mitigates overemphasis on class separability, (ii) a dataset-adaptive DR optimization workflow using structural complexity metrics (Pds and Mnc) to accelerate hyperparameter search and selection, and (iii) distortion-aware brushing to robustly locate high-dimensional clusters despite projection distortions. Together, these developments aim to make DR-enabled visual analytics more trustworthy, reproducible, and efficient, with practical impact on how practitioners select DR techniques, tune parameters, and interact with projections.
Abstract
Visual analytics now plays a central role in decision-making across diverse disciplines, but it can be unreliable: the knowledge or insights derived from the analysis may not accurately reflect the underlying data. In this dissertation, we improve the reliability of visual analytics with a focus on dimensionality reduction (DR). DR techniques enable visual analysis of high-dimensional data by reducing it to two or three dimensions, but they inherently introduce errors that can compromise the reliability of visual analytics. To this end, I investigate reliability challenges that practitioners face when using DR for visual analytics. Then, I propose technical solutions to address these challenges, including new evaluation metrics, optimization strategies, and interaction techniques. We conclude the thesis by discussing how our contributions lay the foundation for achieving more reliable visual analytics practices.
