ShaRP: Shape-Regularized Multidimensional Projections
Alister Machado, Alexandru Telea, Michael Behrisch
TL;DR
This work tackles the problem of opaque visual signatures produced by conventional dimensionality reduction methods by introducing ShaRP, a shape-regularized neural projection that gives users explicit control over cluster shapes in 2D projections. ShaRP is a variational autoencoder–based projection method whose objective combines reconstruction, a classification term, and a KL-divergence regularization: $ \mathcal{L}_{ShaRP} = \mathcal{L}_{recon} + \rho \mathcal{L}_{class} + \beta \mathcal{L}_{reg} $. Shape control is achieved by sampling from distinct distributions: $ \mathbf{z} \sim \mathcal{N}(\vec{\mu}, \operatorname{diag}(\vec{\sigma}^2)) $ for ellipses, $ p(x|\mu,\alpha,\omega) \propto \exp(-(|x-\mu|/\alpha)^\omega) $ for rectangles, and Dirichlet-based barycentric sampling for polygons, enabling ellipse, rectangle, or polygonal cluster shapes while maintaining competitive quality. Empirical results on five datasets show ShaRP delivers consistent, shape-regular projections with performance comparable to or faster than state-of-the-art methods, facilitating interactive exploration and labeling tasks with improved visual control.
Abstract
Projections, or dimensionality reduction methods, are techniques of choice for the visual exploration of high-dimensional data. Many such techniques exist, each one of them having a distinct visual signature - i.e., a recognizable way to arrange points in the resulting scatterplot. Such signatures are implicit consequences of algorithm design, such as whether the method focuses on local vs global data pattern preservation; optimization techniques; and hyperparameter settings. We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot, which can cater better to interactive visualization scenarios. ShaRP scales well with dimensionality and dataset size, generically handles any quantitative dataset, and provides this extended functionality of controlling projection shapes at a small, user-controllable cost in terms of quality metrics.
