Table of Contents
Fetching ...

Through the Grapevine: Vineyard Distance as a Measure of Topological Dissimilarity

Alvan Arulandu, Daniel Gottschalk, Thomas Payne, Alexander Richardson, Thomas Weighill

TL;DR

The paper addresses quantifying differences between datasets by introducing vineyard distance, a metric that bridges $L^p$ functional distance and persistence-based topology distances through a vineyard interpolation of persistence diagrams. It defines vineyard distance via time-parametrized vineyards or straight-line homotopies, and shows equivalence to a weighted Wasserstein formulation, enabling computation by optimal matchings. The authors establish theoretical bounds, including $W_{\infty,1}(P,Q) \leq \mathbb{V}(\ V)$ and, for straight-line homotopies, $\mathbb{V}(\mathcal{V}) \leq \sum_{\dim(\sigma)\in\{d,d+1\}} |f(\sigma)-g(\sigma)|$, plus a lower bound via the minimum vine cost (MVC) in the weighted setting. Through numerical experiments on Gaussian mixtures and image data, and applications to geospatial data and neural-network training dynamics, vineyard distance reveals distinctions that $W_1$ and $L^1$ miss, acting as a richer intermediate metric between purely topological and purely functional measures. The work highlights potential for ensemble analysis of vineyards and motivates future development of efficient algorithms, along with extensions to broader datasets and learning dynamics.

Abstract

We introduce a new measure of distance between datasets, based on vineyards from topological data analysis, which we call the vineyard distance. Vineyard distance measures the extent of topological change along an interpolation from one dataset to another, either along a pre-computed trajectory or via a straight-line homotopy. We demonstrate through theoretical results and experiments that vineyard distance is less sensitive than $L^p$ distance (which considers every single data value), but more sensitive than Wasserstein distance between persistence diagrams (which accounts only for shape and not location). This allows vineyard distance to reveal distinctions that the other two distance measures cannot. In our paper, we establish theoretical results for vineyard distance including as upper and lower bounds. We then demonstrate the usefulness of vineyard distance on real-world data through applications to geospatial data and to neural network training dynamics.

Through the Grapevine: Vineyard Distance as a Measure of Topological Dissimilarity

TL;DR

The paper addresses quantifying differences between datasets by introducing vineyard distance, a metric that bridges functional distance and persistence-based topology distances through a vineyard interpolation of persistence diagrams. It defines vineyard distance via time-parametrized vineyards or straight-line homotopies, and shows equivalence to a weighted Wasserstein formulation, enabling computation by optimal matchings. The authors establish theoretical bounds, including and, for straight-line homotopies, , plus a lower bound via the minimum vine cost (MVC) in the weighted setting. Through numerical experiments on Gaussian mixtures and image data, and applications to geospatial data and neural-network training dynamics, vineyard distance reveals distinctions that and miss, acting as a richer intermediate metric between purely topological and purely functional measures. The work highlights potential for ensemble analysis of vineyards and motivates future development of efficient algorithms, along with extensions to broader datasets and learning dynamics.

Abstract

We introduce a new measure of distance between datasets, based on vineyards from topological data analysis, which we call the vineyard distance. Vineyard distance measures the extent of topological change along an interpolation from one dataset to another, either along a pre-computed trajectory or via a straight-line homotopy. We demonstrate through theoretical results and experiments that vineyard distance is less sensitive than distance (which considers every single data value), but more sensitive than Wasserstein distance between persistence diagrams (which accounts only for shape and not location). This allows vineyard distance to reveal distinctions that the other two distance measures cannot. In our paper, we establish theoretical results for vineyard distance including as upper and lower bounds. We then demonstrate the usefulness of vineyard distance on real-world data through applications to geospatial data and to neural network training dynamics.

Paper Structure

This paper contains 30 sections, 11 theorems, 39 equations, 10 figures, 1 table.

Key Result

Theorem 1

[Cohen-Steiner et al. cohen-steiner_stability_2007] Let $f$ and $g$ be functions on a finite simplicial complex $\mathcal{K}$ and let $\mathsf{dgm}_d(f)$ and $\mathsf{dgm}_d(g)$ be the associated sublevel set filtration persistence diagrams in some dimension $d$. Then,

Figures (10)

  • Figure 1: Comparing a standard Gaussian $f$ with a mean or variance shifted version $g$.
  • Figure 2: Approximate embeddings using MDS and three different distances on a set of $100$ images of $6$s, $9$s and $7$s.
  • Figure 3: Persistence diagram and choropleth pairs for Black (left) and Hispanic (right). Populations for Milwaukee, WI and Lexington, KY.
  • Figure 4: Persistence diagram and census tract pairs for Black (left) and Hispanic (right) populations for Phoenix, AZ and Dallas, TX.
  • Figure 5: Elections plotted as ( $\mathbb{V}^{(w)}(\mathcal{V}(f,g))$, $W_1(f,g)$) coordinate pairs. Points are colored by weighted $L^1$ distance between $f$ and $g$ (left) and by % statewide Democratic vote (right).
  • ...and 5 more figures

Theorems & Definitions (33)

  • Definition 1
  • Remark 1
  • Definition 2
  • Theorem 1
  • Proposition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • ...and 23 more