Mind the Gaps: Measuring Visual Artifacts in Dimensionality Reduction
Jaume Ros, Alessio Arleo, Fernando Paulovich
TL;DR
This paper tackles the problem of visual distortions in 2D DR projections, where traditional PQMs overlook perceptual artifacts. It introduces the Warping Index (WI), an area-based per-triangle distortion metric computed from the 2D Delaunay triangulation and its high-dimensional counterpart, with $Q(\hat{t}_i) = \frac{A(\hat{t}_i) - A(t_i)}{\max(A(\hat{t}_i), A(t_i))}$ and $WI(P) = \frac{1}{\sum_{\hat{t}_i} A(\hat{t}_i)} \sum_{\hat{t}_i} A(\hat{t}_i) |Q(\hat{t}_i)|$. The method runs in $O(n \log n)$ due to triangulation and does not require explicit high-D embeddings, only a distance metric obeying the triangle inequality. The authors demonstrate that WI captures perceptual distortions—such as holes around points or widespread compression—not detected by stress or trustworthiness, using the imdb dataset with Force-Scheme variants and a square synthetic dataset with PCA vs t-SNE. They provide a public Python implementation and discuss future directions like alternative triangulations, region segmentation, and relaxing distance constraints to widen applicability.
Abstract
Dimensionality Reduction (DR) techniques are commonly used for the visual exploration and analysis of high-dimensional data due to their ability to project datasets of high-dimensional points onto the 2D plane. However, projecting datasets in lower dimensions often entails some distortion, which is not necessarily easy to recognize but can lead users to misleading conclusions. Several Projection Quality Metrics (PQMs) have been developed as tools to quantify the goodness-of-fit of a DR projection; however, they mostly focus on measuring how well the projection captures the global or local structure of the data, without taking into account the visual distortion of the resulting plots, thus often ignoring the presence of outliers or artifacts that can mislead a visual analysis of the projection. In this work, we introduce the Warping Index (WI), a new metric for measuring the quality of DR projections onto the 2D plane, based on the assumption that the correct preservation of empty regions between points is of crucial importance towards a faithful visual representation of the data.
