Table of Contents
Fetching ...

Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport

Michael Wilson, Tom Needham, Anuj Srivastava

TL;DR

A decomposition of the Fr\'echet variance of a set of measures in the 2-Wasserstein space is presented, which allows one to compute the percentage of variance explained by LOT embeddings of those measures and the results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data.

Abstract

Wasserstein distances form a family of metrics on spaces of probability measures that have recently seen many applications. However, statistical analysis in these spaces is complex due to the nonlinearity of Wasserstein spaces. One potential solution to this problem is Linear Optimal Transport (LOT). This method allows one to find a Euclidean embedding, called LOT embedding, of measures in some Wasserstein spaces, but some information is lost in this embedding. So, to understand whether statistical analysis relying on LOT embeddings can make valid inferences about original data, it is helpful to quantify how well these embeddings describe that data. To answer this question, we present a decomposition of the Fréchet variance of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures. We then extend this decomposition to the Fused Gromov-Wasserstein setting. We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained by the embedding, and the classification accuracy of machine learning classifiers built on the embedded data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and Diffusion Tensor MRI images for these experiments. Our results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data.

Fused Gromov-Wasserstein Variance Decomposition with Linear Optimal Transport

TL;DR

A decomposition of the Fr\'echet variance of a set of measures in the 2-Wasserstein space is presented, which allows one to compute the percentage of variance explained by LOT embeddings of those measures and the results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data.

Abstract

Wasserstein distances form a family of metrics on spaces of probability measures that have recently seen many applications. However, statistical analysis in these spaces is complex due to the nonlinearity of Wasserstein spaces. One potential solution to this problem is Linear Optimal Transport (LOT). This method allows one to find a Euclidean embedding, called LOT embedding, of measures in some Wasserstein spaces, but some information is lost in this embedding. So, to understand whether statistical analysis relying on LOT embeddings can make valid inferences about original data, it is helpful to quantify how well these embeddings describe that data. To answer this question, we present a decomposition of the Fréchet variance of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures. We then extend this decomposition to the Fused Gromov-Wasserstein setting. We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained by the embedding, and the classification accuracy of machine learning classifiers built on the embedded data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and Diffusion Tensor MRI images for these experiments. Our results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data.

Paper Structure

This paper contains 22 sections, 3 theorems, 42 equations, 6 figures, 2 tables.

Key Result

Proposition 1

Let $\nu = \sum_{i=1}^n a_i \delta_{x_i}$, $\mu = \sum_{j=1}^m b_j \delta_{y_j}$ be empirical probability measures supported on $\mathbb{R}^d$, let $\gamma \in U(a,b)$ be an optimal coupling of $\nu$ and $\mu$, and let $T$ be the barycentric projection map induced by $\gamma$. Then

Figures (6)

  • Figure 1: Top Row: Gaussian Kernel Reconstructions and support sets for free support barycenters of MNIST 5 digit, k = 150, 50, and 20; Bottom Rows: Original Data, Gaussian Kernel Reconstructions of Barycentric projections, and Support point locations for two 5 digits from the MNIST data set.
  • Figure 2: Left: Components of decomposition calculated with respect to a free support Barycenter with different numbers of support points, for MNIST data; Middle: percentage of 2-Wasserstein variance explained by LOT embeddings with respect to a free support Barycenter with different numbers of support points, for MNIST data; Right: Multiclass classification accuracy of SVM built on LOT embeddings with respect to a free support Barycenter with different numbers of support points
  • Figure 3: Points represent locations in $\mathbb{R}^3$, $(r,g,b)$ values correspond to (normalized) diagonal elements of covariance matrices. Left: free support barycenter of 'Fornix L' in HCP data (200 support points); Middle: Subject 100307's 'Fornix L' in the HCP data (6980 support points); Right: Barycentric Projection of Subject 100307's 'Fornix L' with respect to calculated barycenter (200 support points).
  • Figure 4: Left: Fréchet variance and components of decomposition as a function of number of support points in free support barycenter (for n=1,10,20,...,200) for Fornix L in HCP-YA; Middle: percentage of variance explained by LOT embedding with respect to a free support barycenter with n support points, for ROI (in left hemisphere) in HCP-YA; Right: SVM classification accuracy on LOT embedded data with respect to number of support points in free support barycenter.
  • Figure 5: Top Row: Gaussian Kernel Reconstructions and graph representations for free support Fused Gromov-Wasserstein barycenters ($\alpha = 0.5$) of MNIST 5 digit, k = 150, 50, and 20; Bottom Rows: Original Data, Gaussian Kernel Reconstructions of Barycentric projections, and graph representations for two 5 digits from the MNIST data set.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • Definition 4
  • Definition 5
  • Definition 6
  • Theorem 1
  • Definition 7
  • Definition 8
  • ...and 2 more