Table of Contents
Fetching ...

Distances for Markov Chains, and Their Differentiation

Tristan Brugère, Zhengchao Wan, Yusu Wang

TL;DR

The paper introduces Optimal Transport Markov (OTM) distances to unify graph comparison approaches by treating graphs as Markov chains, showing the WL and OTC distances arise as extremes within this framework. It then defines the delta-discounted WL distance, a one-parameter, regularized instance that admits a fixed-point computation and differentiability via entropy-regularization, while preserving discriminative power. The work establishes precise theoretical connections: WL, OTC, and OTM belong along a spectrum with $d_{ ext{dWL}}^{(k)} = d_{ ext{dOTM}}^{p^{(k)}}$ and $d_{ ext{dWL}}^{(oldsymbol{ inite/oldsymbol{ extinfty})}}$ converging to OTC as the discount vanishes; it also provides convergence rates and a practical Sinkhorn-based gradient. Empirically, the delta-discounted WL distance achieves competitive performance in graph classification and yields meaningful graph barycenters, albeit with higher computational cost than FGW, and the authors release a Python library to enable broad usage and further development.

Abstract

(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed in (O'Connor et al., 2022) and the WL distance in (Chen et al., 2022). Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, our new discounted WL distance can be differentiated after a entropy-regularization similar to the Sinkhorn distance, making it suitable to use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.

Distances for Markov Chains, and Their Differentiation

TL;DR

The paper introduces Optimal Transport Markov (OTM) distances to unify graph comparison approaches by treating graphs as Markov chains, showing the WL and OTC distances arise as extremes within this framework. It then defines the delta-discounted WL distance, a one-parameter, regularized instance that admits a fixed-point computation and differentiability via entropy-regularization, while preserving discriminative power. The work establishes precise theoretical connections: WL, OTC, and OTM belong along a spectrum with and converging to OTC as the discount vanishes; it also provides convergence rates and a practical Sinkhorn-based gradient. Empirically, the delta-discounted WL distance achieves competitive performance in graph classification and yields meaningful graph barycenters, albeit with higher computational cost than FGW, and the authors release a Python library to enable broad usage and further development.

Abstract

(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed in (O'Connor et al., 2022) and the WL distance in (Chen et al., 2022). Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, our new discounted WL distance can be differentiated after a entropy-regularization similar to the Sinkhorn distance, making it suitable to use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.
Paper Structure (63 sections, 28 theorems, 134 equations, 5 figures, 1 table)

This paper contains 63 sections, 28 theorems, 134 equations, 5 figures, 1 table.

Key Result

proposition 1

For any distribution $p$ on $\mathbb{N}$, one has that

Figures (5)

  • Figure 1: Barycenter computation of 30 noisy circle graphs
  • Figure 2: Barycenter experiment
  • Figure 3: Performance analysis results on an Nvidia RTX A6000 GPU
  • Figure 4: Same barycenter experiment ($n\_targets=20$, $p=0.01$), run with different values of $\epsilon$ (in abscissa) and $\delta$ in ordinate
  • Figure 5: Coarsening results on a circle graph of size 30. The original graph is on the left, the subsequent graphs are coarsenings of different sizes.

Theorems & Definitions (63)

  • remark 1: Nuance in definition
  • remark 2: A note on symbols
  • definition 1
  • remark 3: Optimal Markovian couplings exist
  • proposition 1: A $d_{\glssymbol{dWL}} ^{ (k)}$-based lower bound
  • proposition 2: $d_{\glssymbol{dOTC}}$ is an upper bound
  • proposition 3: Zero-sets
  • definition 2
  • remark 4: $k=\infty$
  • remark 5: $\delta=0$
  • ...and 53 more