Stop Misusing t-SNE and UMAP for Visual Analytics
Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo
TL;DR
This paper investigates the widespread misuse of t-SNE and UMAP in visual analytics, where local-neighborhood preservation is overextended to infer global cluster relationships. Through a literature review of 136 papers, interviews with 12 practitioners, and interviews with 8 DR experts, the authors identify limited DR literacy and motivational gaps as primary drivers of misuse, with existing mitigation efforts largely ineffective. They propose automating the selection of DR projections (VoyagerDR) as a pragmatic step, while emphasizing the need to maintain user agency and explainability. The work highlights the practical significance of improving how DR techniques are chosen and evaluated, aiming to enhance the reliability of visual analytics and stimulate broader discussion on responsible ML usage.
Abstract
Misuses of t-SNE and UMAP in visual analytics have become increasingly common. For example, although t-SNE and UMAP projections often do not faithfully reflect the original distances between clusters, practitioners frequently use them to investigate inter-cluster relationships. We investigate why this misuse occurs, and discuss methods to prevent it. To that end, we first review 136 papers to verify the prevalence of the misuse. We then interview researchers who have used dimensionality reduction (DR) to understand why such misuse occurs. Finally, we interview DR experts to examine why previous efforts failed to address the misuse. We find that the misuse of t-SNE and UMAP stems primarily from limited DR literacy among practitioners, and that existing attempts to address this issue have been ineffective. Based on these insights, we discuss potential paths forward, including the controversial but pragmatic option of automating the selection of optimal DR projections to prevent misleading analyses.
