Table of Contents
Fetching ...

The Rashomon Effect for Visualizing High-Dimensional Data

Yiyang Sun, Haiyang Huang, Gaurav Rajesh Parikh, Cynthia Rudin

Abstract

Dimension reduction (DR) is inherently non-unique: multiple embeddings can preserve the structure of high-dimensional data equally well while differing in layout or geometry. In this paper, we formally define the Rashomon set for DR -- the collection of `good' embedding -- and show how embracing this multiplicity leads to more powerful and trustworthy representations. Specifically, we pursue three goals. First, we introduce PCA-informed alignment to steer embeddings toward principal components, making axes interpretable without distorting local neighborhoods. Second, we design concept-alignment regularization that aligns an embedding dimension with external knowledge, such as class labels or user-defined concepts. Third, we propose a method to extract common knowledge across the Rashomon set by identifying trustworthy and persistent nearest-neighbor relationships, which we use to construct refined embeddings with improved local structure while preserving global relationships. By moving beyond a single embedding and leveraging the Rashomon set, we provide a flexible framework for building interpretable, robust, and goal-aligned visualizations.

The Rashomon Effect for Visualizing High-Dimensional Data

Abstract

Dimension reduction (DR) is inherently non-unique: multiple embeddings can preserve the structure of high-dimensional data equally well while differing in layout or geometry. In this paper, we formally define the Rashomon set for DR -- the collection of `good' embedding -- and show how embracing this multiplicity leads to more powerful and trustworthy representations. Specifically, we pursue three goals. First, we introduce PCA-informed alignment to steer embeddings toward principal components, making axes interpretable without distorting local neighborhoods. Second, we design concept-alignment regularization that aligns an embedding dimension with external knowledge, such as class labels or user-defined concepts. Third, we propose a method to extract common knowledge across the Rashomon set by identifying trustworthy and persistent nearest-neighbor relationships, which we use to construct refined embeddings with improved local structure while preserving global relationships. By moving beyond a single embedding and leveraging the Rashomon set, we provide a flexible framework for building interpretable, robust, and goal-aligned visualizations.

Paper Structure

This paper contains 35 sections, 2 theorems, 19 equations, 34 figures, 2 tables, 2 algorithms.

Key Result

Theorem 5.1

Let $(i, j^*, j')$ be a triplet of points, and let $\{y^{(1)}, \dots, y^{(T)}\}$ be $T$ i.i.d. low-dimensional embeddings sampled from $\mathcal{D}$. Define the margin variable for the $t$-th embedding based on the population scoring function $\Psi$: Since $\Psi$ depends on the random sample $y^{(t)}$ through $d^{y^{(t)}}_{ij}$ while using only fixed population constants, and the $y^{(t)}$ are i.i

Figures (34)

  • Figure 1: Three goals for generating and exploring the Rashomon set for dimension reduction
  • Figure 2: $\text{PaCMAP}_{\textrm{param}\text{ }}$ embedding with and without PCA-Informed Alignment. The colored curves overlaid on the embeddings are generated by applying the learned parametric DR mapping to points sampled along the first two principal component directions in the original high-dimensional space, thereby visualizing how the DR mapping transforms the PCA axes.
  • Figure 3: (a) MNIST PaCMAP param embedding, (b) PCA embedding, (c) PCA-informed embedding with $\lambda_{\text{PCA}}$ set to be $0.1$. It is nicely aligned with the first two principal components while capturing the detailed cluster structure. (d) Triplet PCA score has improved after PCA-informed alignment.
  • Figure 4: (a) Concept-informed aligned $\text{PaCMAP}_{\textrm{param}\text{ }}$ embedding. Alignment is along the horizontal axis from feet (left) to head (right). Footwear is labeled in shades of red to orange, trousers in yellow, dresses in light yellow, pullovers and coats in green, shirts and t-shirts in blue, handbags in purple. (b) Evaluation metrics and losses for FMNIST before and after concept alignment, which remain generally unchanged.
  • Figure 5: (a) Original $\text{PaCMAP}_{\textrm{param}\text{ }}$ embedding of USPS dataset. (b) Common knowledge embedding using only stable neighbor pairs within the Rashomon set. (c) Quantitative comparison of original vs. combined DR embeddings across three evaluation metrics for five methods.
  • ...and 29 more figures

Theorems & Definitions (6)

  • Definition 3.1: Rashomon set of Dimension Reduction from a Loss Perspective
  • Definition 3.2: Rashomon set of Dimension Reduction from Graph Perspective
  • Definition 3.3: Soft Jaccard Distance Between Weighted Matrices
  • Theorem 5.1
  • Theorem A.1
  • proof