Table of Contents
Fetching ...

A Wasserstein Graph Distance Based on Distributions of Probabilistic Node Embeddings

Michael Scholkemper, Damin Kühn, Gerion Nabbefeld, Simon Musall, Björn Kampa, Michael T. Schaub

TL;DR

This work addresses the problem of unsupervised graph distance by representing each graph as a Gaussian mixture over random node embeddings and measuring distance with the Wasserstein metric between mixtures. It introduces two probabilistic node embeddings, CCB and CNP, to generate expressive graph representations and demonstrates that the resulting MW_2^2 distance provides a scalable and interpretable graph distance. The approach yields a transport plan that aligns nodes across graphs and supports comparisons between graphs of different sizes, with efficient variants (Full/Scaled/Tied) that speed up computation. Empirical results on synthetic networks and Functional Brain Connectivity data show competitive performance and favorable scalability compared to existing embedding and OT-based methods.

Abstract

Distance measures between graphs are important primitives for a variety of learning tasks. In this work, we describe an unsupervised, optimal transport based approach to define a distance between graphs. Our idea is to derive representations of graphs as Gaussian mixture models, fitted to distributions of sampled node embeddings over the same space. The Wasserstein distance between these Gaussian mixture distributions then yields an interpretable and easily computable distance measure, which can further be tailored for the comparison at hand by choosing appropriate embeddings. We propose two embeddings for this framework and show that under certain assumptions about the shape of the resulting Gaussian mixture components, further computational improvements of this Wasserstein distance can be achieved. An empirical validation of our findings on synthetic data and real-world Functional Brain Connectivity networks shows promising performance compared to existing embedding methods.

A Wasserstein Graph Distance Based on Distributions of Probabilistic Node Embeddings

TL;DR

This work addresses the problem of unsupervised graph distance by representing each graph as a Gaussian mixture over random node embeddings and measuring distance with the Wasserstein metric between mixtures. It introduces two probabilistic node embeddings, CCB and CNP, to generate expressive graph representations and demonstrates that the resulting MW_2^2 distance provides a scalable and interpretable graph distance. The approach yields a transport plan that aligns nodes across graphs and supports comparisons between graphs of different sizes, with efficient variants (Full/Scaled/Tied) that speed up computation. Empirical results on synthetic networks and Functional Brain Connectivity data show competitive performance and favorable scalability compared to existing embedding and OT-based methods.

Abstract

Distance measures between graphs are important primitives for a variety of learning tasks. In this work, we describe an unsupervised, optimal transport based approach to define a distance between graphs. Our idea is to derive representations of graphs as Gaussian mixture models, fitted to distributions of sampled node embeddings over the same space. The Wasserstein distance between these Gaussian mixture distributions then yields an interpretable and easily computable distance measure, which can further be tailored for the comparison at hand by choosing appropriate embeddings. We propose two embeddings for this framework and show that under certain assumptions about the shape of the resulting Gaussian mixture components, further computational improvements of this Wasserstein distance can be achieved. An empirical validation of our findings on synthetic data and real-world Functional Brain Connectivity networks shows promising performance compared to existing embedding methods.
Paper Structure (8 sections, 5 theorems, 19 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 8 sections, 5 theorems, 19 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Proposition 1

For the sample size $s \rightarrow \infty$, CNP defines a pseudometric on the space of all graphs and CCB defines a pseudometric on the space of all adjacency matrices.

Figures (1)

  • Figure 1: CCB-TiedW distances for the synthetic networks (a) and the functional connectivity networks (c). The networks are ordered according to the hierarchical clustering dendrogram where small heights correspond to small cluster distances. These distances can also be projected into a 2D space using UMAP for visualization purposes (b,d). Class memberships are indicated by the same color scheme in both corresponding plots.

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • Proposition 2
  • Lemma 1: Proposition 5, delon2020wasserstein
  • Proposition 2