Lens functions for exploring UMAP Projections with Domain Knowledge
Daniel M. Bot, Jan Aerts
TL;DR
The paper addresses the challenge of extracting domain-relevant patterns from UMAP projections by introducing three lens functions that modify graph connectivity to reflect domain signals. It operationalizes these lenses as Global Lens, Global Mask, and Local Mask, preserving the initial layout as a stable starting point while reconfiguring edges to reveal structure aligned with specific questions. Two real-world use cases—Breast Cancer Gene Expression and Air Quality—demonstrate how lens-enabled projections expose relations among genes and temporal/pollutant patterns, complemented by a synthetic benchmark that characterizes computational costs. The authors provide an open-source Python package to make these lensing techniques accessible for interactive exploration, highlighting the practical impact of domain-knowledge guided visualization in high-dimensional data analysis.
Abstract
Dimensionality reduction algorithms are often used to visualise high-dimensional data. Previously, studies have used prior information to enhance or suppress expected patterns in projections. In this paper, we adapt such techniques for domain knowledge guided interactive exploration. Inspired by Mapper and STAD, we present three types of lens functions for UMAP, a state-of-the-art dimensionality reduction algorithm. Lens functions enable analysts to adapt projections to their questions, revealing otherwise hidden patterns. They filter the modelled connectivity to explore the interaction between manually selected features and the data's structure, creating configurable perspectives each potentially revealing new insights. The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark. Our implementation is available in an open-source Python package: https://github.com/vda-lab/lensed_umap.
