Table of Contents
Fetching ...

ShaRP: Shape-Regularized Multidimensional Projections

Alister Machado, Alexandru Telea, Michael Behrisch

TL;DR

This work tackles the problem of opaque visual signatures produced by conventional dimensionality reduction methods by introducing ShaRP, a shape-regularized neural projection that gives users explicit control over cluster shapes in 2D projections. ShaRP is a variational autoencoder–based projection method whose objective combines reconstruction, a classification term, and a KL-divergence regularization: $ \mathcal{L}_{ShaRP} = \mathcal{L}_{recon} + \rho \mathcal{L}_{class} + \beta \mathcal{L}_{reg} $. Shape control is achieved by sampling from distinct distributions: $ \mathbf{z} \sim \mathcal{N}(\vec{\mu}, \operatorname{diag}(\vec{\sigma}^2)) $ for ellipses, $ p(x|\mu,\alpha,\omega) \propto \exp(-(|x-\mu|/\alpha)^\omega) $ for rectangles, and Dirichlet-based barycentric sampling for polygons, enabling ellipse, rectangle, or polygonal cluster shapes while maintaining competitive quality. Empirical results on five datasets show ShaRP delivers consistent, shape-regular projections with performance comparable to or faster than state-of-the-art methods, facilitating interactive exploration and labeling tasks with improved visual control.

Abstract

Projections, or dimensionality reduction methods, are techniques of choice for the visual exploration of high-dimensional data. Many such techniques exist, each one of them having a distinct visual signature - i.e., a recognizable way to arrange points in the resulting scatterplot. Such signatures are implicit consequences of algorithm design, such as whether the method focuses on local vs global data pattern preservation; optimization techniques; and hyperparameter settings. We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot, which can cater better to interactive visualization scenarios. ShaRP scales well with dimensionality and dataset size, generically handles any quantitative dataset, and provides this extended functionality of controlling projection shapes at a small, user-controllable cost in terms of quality metrics.

ShaRP: Shape-Regularized Multidimensional Projections

TL;DR

This work tackles the problem of opaque visual signatures produced by conventional dimensionality reduction methods by introducing ShaRP, a shape-regularized neural projection that gives users explicit control over cluster shapes in 2D projections. ShaRP is a variational autoencoder–based projection method whose objective combines reconstruction, a classification term, and a KL-divergence regularization: . Shape control is achieved by sampling from distinct distributions: for ellipses, for rectangles, and Dirichlet-based barycentric sampling for polygons, enabling ellipse, rectangle, or polygonal cluster shapes while maintaining competitive quality. Empirical results on five datasets show ShaRP delivers consistent, shape-regular projections with performance comparable to or faster than state-of-the-art methods, facilitating interactive exploration and labeling tasks with improved visual control.

Abstract

Projections, or dimensionality reduction methods, are techniques of choice for the visual exploration of high-dimensional data. Many such techniques exist, each one of them having a distinct visual signature - i.e., a recognizable way to arrange points in the resulting scatterplot. Such signatures are implicit consequences of algorithm design, such as whether the method focuses on local vs global data pattern preservation; optimization techniques; and hyperparameter settings. We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot, which can cater better to interactive visualization scenarios. ShaRP scales well with dimensionality and dataset size, generically handles any quantitative dataset, and provides this extended functionality of controlling projection shapes at a small, user-controllable cost in terms of quality metrics.
Paper Structure (11 sections, 6 equations, 6 figures, 4 tables)

This paper contains 11 sections, 6 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of projections of the MNIST dataset learned using (a) Auto-encoders, (b) SSNP DBLP:conf/ivapp/EspadotoHT21, and (c) ShaRP. SSNP and ShaRP were trained using the ground truth labels as class information --- encoded, here and next, by colors. Values in brackets are Distance Consistency scores (DSC SipsDSCMetric), a quality metric that measures separability of clusters, with 1 being a perfect score.
  • Figure 2: Shaping clusters as rectangles can be convenient for data labeling tasks, as illustrated by the right image where class image representatives are overlaid atop their respective clusters. We achieve this using a Generalized Normal distribution for sampling, here shown on the MNIST dataset for $\omega=10$ (left).
  • Figure 3: The results of our Triangular shaping sampling scheme over 3 different datasets. DSC values (in brackets) are close to the best value possible, indicating that we do not harm class separability.
  • Figure 4: Our ShaRP method produces cluster shapes regularized towards a user-chosen target --- here, ellipses --- and can handle diverse data distributions. We demonstrate this here for the cases where we use ground truth labels (GT) or K-Means-generated pseudolabels (KM). We compare our results to SSNP (GT, KM) and to t-SNE and UMAP. More comparisons are present in the supplemental material.
  • Figure 5: The $\beta$ coefficient (\ref{['eqn:sharp-loss']}) controls the shape regularization strength, shown here on the USPS dataset. $\beta = 0$ approximately reproduces SSNP (a). Increasing it (b, c) progressively forces the learned clusters into circular shapes, up to the point where they are no longer separable and the projection is of low quality (d).
  • ...and 1 more figures