GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Eduarda de Souza Marques; Arthur Sobrinho Ferreira da Rocha; Joao Paixao; Heudson Mirandola; Daniel Sadoc Menasche

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Eduarda de Souza Marques, Arthur Sobrinho Ferreira da Rocha, Joao Paixao, Heudson Mirandola, Daniel Sadoc Menasche

TL;DR

A binary classifier derived from $\theta(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool, quantifying whether z is explained relatively more by $A$, more by $B$, or comparably by both.

Abstract

Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint $Ax = By = z$ in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form $A = HCU$, $B = HSV$ with $C^{\top}C + S^{\top}S = I$, which separates shared versus dataset-specific directions through the diagonal structure of $(C, S)$. From these factors we derive an interpretable *angle score* $θ(z) \in [0, π/2]$ for a sample $z$, quantifying whether z is explained relatively more by $A$, more by $B$, or comparably by both. The primary role of $θ(z)$ is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from $θ(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool.

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

TL;DR

A binary classifier derived from

is presented as an illustrative application of the score as an interpretable diagnostic tool, quantifying whether z is explained relatively more by

, more by

, or comparably by both.

Abstract

in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form

with

, which separates shared versus dataset-specific directions through the diagonal structure of

. From these factors we derive an interpretable *angle score*

for a sample

, quantifying whether z is explained relatively more by

, more by

, or comparably by both. The primary role of

is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from

is presented as an illustrative application of the score as an interpretable diagnostic tool.

Paper Structure (35 sections, 6 theorems, 76 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 6 theorems, 76 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
A relation primitive.
GSVD yields a joint frame.
A single alignment angle.
Contributions.
Organization.
Related work
Background: Linear relations and GSVD
Linear relations: co-span vs. span
GSVD as a joint geometry
A relative alignment angle $\theta(z)$ and its uses
Defining the Alignment Angle
Interpretation.
Computing $\theta(z)$ in the GSVD frame
Finding Extreme Directions
...and 20 more sections

Key Result

Theorem 1

where $c(z)=H^\dagger z$ and $(H,C,S,U,V)$ are the GSVD factors in eq:intro_gsvd.

Figures (6)

Figure 1: Representative MNIST digit samples for the GSVD pipeline. (a,b) raw digits "1" and "5"; (c,d) corresponding vectors after mean-centering.
Figure 2: Empirical distributions of the angle $\theta(z)$ on the MNIST test set for four digit pairs. Values closer to $0$ indicate stronger alignment with the first digit in the pair, while values closer to $90^\circ$ indicate stronger alignment with the second.
Figure 3: Representative $H$ component directions reconstructed in the image space as 28x28 images, using the viridis colormap, obtained by the GSVD-based optimization in Section \ref{['sec:optimization']}: (a) representation of a solution from the minimization problem indicating a more "4-like" direction; (c) representation of the solution from the maximization problem indicating a more "9-like" direction; (b) reconstruction of a shared direction that encapsulates the structure of both "4" and "9".
Figure 4: Representative MNIST Fashion samples for the GSVD pipeline. (a,b) raw "T-shirt" and "Sneakers"; (c,d) corresponding vectors after mean-centering difference.
Figure 5: Empirical distributions of the angle $\theta$ on the test set for two Fashion-MNIST class pairs under the same GSVD pipeline used for MNIST digits (same preprocessing and the same rank choices $(p,q)$). (a) T-shirt/top (class 0) vs. Sneaker (class 7), where $A$ is built from T-shirt/top columns and $B$ from Sneaker columns; the separation indicates limited shared structure. (b) Sneaker (class 7) vs. Ankle boot (class 9), which shows larger overlap, consistent with more shared directions between these visually related footwear classes. As in the main text, angles near $0$ indicate stronger alignment with $A$ and angles near $90^\circ$ indicate stronger alignment with $B$.
...and 1 more figures

Theorems & Definitions (16)

Definition 1: Alignment angle
Theorem 1: GSVD produces alignment angles
proof
Theorem 2: GSVD produces extreme directions
proof
Remark 1: Deflation / subsequent directions
Remark 2
Lemma 1: GSVD induces an explicit parameterization of the co-span constraint
proof
Lemma 2: Nullspace orthogonality fixes the free GSVD blocks
...and 6 more

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

TL;DR

Abstract

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (16)