Table of Contents
Fetching ...

Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy

Ya-Wei Eileen Lin, Ronald R. Coifman, Gal Mishne, Ronen Talmon

TL;DR

This work tackles the challenge of measuring meaningful distances for high‑dimensional data when features possess a latent hierarchical structure. It learns a latent feature tree by embedding features in a multi‑scale hyperbolic space using diffusion geometry and then decoding a binary feature tree via a hyperbolic diffusion LCA (HD‑LCA), producing a data‑driven Tree‑Wasserstein Distance $ exttt{TW}(oldsymbol{x}_i,oldsymbol{x}_{i'},B)$ that matches the ground‑truth latent metric $d_T$. The authors prove that the decoded tree yields a bilipschitz equivalent distance to the true latent TW and demonstrate scalable, linear‑time computation using diffusion landmarks; empirical results on synthetic data, word‑document classifications, and single‑cell RNA sequencing show clear advantages over existing TWD methods and pre‑trained baselines. The approach enables unsupervised ground‑metric learning from observations and provides a differentiable, geometry‑aware distance that improves downstream tasks while remaining efficient for large feature sets.

Abstract

Finding meaningful distances between high-dimensional data samples is an important scientific task. To this end, we propose a new tree-Wasserstein distance (TWD) for high-dimensional data with two key aspects. First, our TWD is specifically designed for data with a latent feature hierarchy, i.e., the features lie in a hierarchical space, in contrast to the usual focus on embedding samples in hyperbolic space. Second, while the conventional use of TWD is to speed up the computation of the Wasserstein distance, we use its inherent tree as a means to learn the latent feature hierarchy. The key idea of our method is to embed the features into a multi-scale hyperbolic space using diffusion geometry and then present a new tree decoding method by establishing analogies between the hyperbolic embedding and trees. We show that our TWD computed based on data observations provably recovers the TWD defined with the latent feature hierarchy and that its computation is efficient and scalable. We showcase the usefulness of the proposed TWD in applications to word-document and single-cell RNA-sequencing datasets, demonstrating its advantages over existing TWDs and methods based on pre-trained models.

Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy

TL;DR

This work tackles the challenge of measuring meaningful distances for high‑dimensional data when features possess a latent hierarchical structure. It learns a latent feature tree by embedding features in a multi‑scale hyperbolic space using diffusion geometry and then decoding a binary feature tree via a hyperbolic diffusion LCA (HD‑LCA), producing a data‑driven Tree‑Wasserstein Distance that matches the ground‑truth latent metric . The authors prove that the decoded tree yields a bilipschitz equivalent distance to the true latent TW and demonstrate scalable, linear‑time computation using diffusion landmarks; empirical results on synthetic data, word‑document classifications, and single‑cell RNA sequencing show clear advantages over existing TWD methods and pre‑trained baselines. The approach enables unsupervised ground‑metric learning from observations and provides a differentiable, geometry‑aware distance that improves downstream tasks while remaining efficient for large feature sets.

Abstract

Finding meaningful distances between high-dimensional data samples is an important scientific task. To this end, we propose a new tree-Wasserstein distance (TWD) for high-dimensional data with two key aspects. First, our TWD is specifically designed for data with a latent feature hierarchy, i.e., the features lie in a hierarchical space, in contrast to the usual focus on embedding samples in hyperbolic space. Second, while the conventional use of TWD is to speed up the computation of the Wasserstein distance, we use its inherent tree as a means to learn the latent feature hierarchy. The key idea of our method is to embed the features into a multi-scale hyperbolic space using diffusion geometry and then present a new tree decoding method by establishing analogies between the hyperbolic embedding and trees. We show that our TWD computed based on data observations provably recovers the TWD defined with the latent feature hierarchy and that its computation is efficient and scalable. We showcase the usefulness of the proposed TWD in applications to word-document and single-cell RNA-sequencing datasets, demonstrating its advantages over existing TWDs and methods based on pre-trained models.

Paper Structure

This paper contains 56 sections, 10 theorems, 32 equations, 10 figures, 7 tables, 3 algorithms.

Key Result

Proposition 4.1

The hyperbolic LCA $\mathbf{z}_{j}^k \vee \mathbf{z}_{j'}^k$ in Def. def:hyperbolic_lca has a closed-form solution, given by $\mathbf{z}_{j}^k \vee \mathbf{z}_{j'}^k = \left[\frac{1}{2}\left(\bm{\psi}_j^k + \bm{\psi}_{j'}^k\right)^\top, \mathtt{proj}(\mathbf{z}_j^k \vee \mathbf{z}_{j'}^k) \right]^\t

Figures (10)

  • Figure 1: (a) The LCA relation of $j_1, j_2, j_3$ on tree $B$. (b) The hyperbolic LCA relation $[j_2 \vee j_3 \sim j_1]_{\mathbb{H}^{m+1}}^k$ for $\mathbf{z}_{j_1}^k, \mathbf{z}_{j_2}^k, \mathbf{z}_{j_3}^k \in \mathbb{H}^{m+1}$ along with their geodesic paths (red semi-circles). Blue points indicate the hyperbolic LCAs, and yellow points represent their orthogonal projections. The value $\mathtt{proj}(\cdot)$ signifies the parent-child relation. (c) The HD-LCA $\overline{\mathbf{h}}_{j_1, j_2}$ defined as the Riemannian mean of the orthogonal projections $\{\mathbf{o}_{j_1, j_2}^k\}_{k=0}^{K_c}$, incorporating the multi-scale hyperbolic LCAs.
  • Figure 2: (a) Illustration of the probabilistic hierarchical model for generating synthetic samples consisting of 8 binary elements. The orange nodes represent (sub)categories, and the black nodes present produce items. The edge weights represent the probabilities. (b) Feature trees constructed by our TWD and competing baselines. Nodes corresponding to fruits are colored in blue, those representing vegetables are in red, and the internal nodes are in green.
  • Figure 3: The normalized Frobenius norm of the difference between the proposed TWD and ground truth TWD with different number of samples $n$.
  • Figure 4: Run time of the proposed TWD and competing TWD and OT distances on scRNA-seq datasets and word-document datasets.
  • Figure 5: The classification accuracy of the proposed TWD using the distance based on cosine similarity and Euclidean distance for scRNA-seq and word-document datasets.
  • ...and 5 more figures

Theorems & Definitions (32)

  • Definition 4.1: Hyperbolic LCA
  • Proposition 4.1
  • Definition 4.2: Hyperbolic LCA Relation
  • Proposition 4.2
  • Definition 4.3: HD-LCA
  • Proposition 4.3
  • Definition 4.4: HD-LCA Relation
  • Proposition 4.4
  • Theorem 4.1
  • Definition 4.5: TWD for High-Dimensional Data with a Latent Feature Hierarchy.
  • ...and 22 more