Distances for Markov Chains, and Their Differentiation

Tristan Brugère; Zhengchao Wan; Yusu Wang

Distances for Markov Chains, and Their Differentiation

Tristan Brugère, Zhengchao Wan, Yusu Wang

TL;DR

The paper introduces Optimal Transport Markov (OTM) distances to unify graph comparison approaches by treating graphs as Markov chains, showing the WL and OTC distances arise as extremes within this framework. It then defines the delta-discounted WL distance, a one-parameter, regularized instance that admits a fixed-point computation and differentiability via entropy-regularization, while preserving discriminative power. The work establishes precise theoretical connections: WL, OTC, and OTM belong along a spectrum with $d_{ ext{dWL}}^{(k)} = d_{ ext{dOTM}}^{p^{(k)}}$ and $d_{ ext{dWL}}^{(oldsymbol{inite/oldsymbol{ extinfty})}}$ converging to OTC as the discount vanishes; it also provides convergence rates and a practical Sinkhorn-based gradient. Empirically, the delta-discounted WL distance achieves competitive performance in graph classification and yields meaningful graph barycenters, albeit with higher computational cost than FGW, and the authors release a Python library to enable broad usage and further development.

Abstract

(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed in (O'Connor et al., 2022) and the WL distance in (Chen et al., 2022). Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, our new discounted WL distance can be differentiated after a entropy-regularization similar to the Sinkhorn distance, making it suitable to use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.

Distances for Markov Chains, and Their Differentiation

TL;DR

and

converging to OTC as the discount vanishes; it also provides convergence rates and a practical Sinkhorn-based gradient. Empirically, the delta-discounted WL distance achieves competitive performance in graph classification and yields meaningful graph barycenters, albeit with higher computational cost than FGW, and the authors release a Python library to enable broad usage and further development.

Abstract

Paper Structure (63 sections, 28 theorems, 134 equations, 5 figures, 1 table)

This paper contains 63 sections, 28 theorems, 134 equations, 5 figures, 1 table.

Introduction
Our contributions.
Relation to the fused-GW (FGW) distance of titouan2019optimal.
Preliminaries
Probability Measures and Markov Chains
Couplings.
Markovian couplings.
Optimal Transport and Distances between Markov Chains
The .
The .
The .
Optimal Transport Markov Distances
The Discounted WL Distance
Limitations of the WL Distance and the OTC Distance
Stationary initial distributions.
...and 48 more sections

Key Result

proposition 1

For any distribution $p$ on $\mathbb{N}$, one has that

Figures (5)

Figure 1: Barycenter computation of 30 noisy circle graphs
Figure 2: Barycenter experiment
Figure 3: Performance analysis results on an Nvidia RTX A6000 GPU
Figure 4: Same barycenter experiment ($n\_targets=20$, $p=0.01$), run with different values of $\epsilon$ (in abscissa) and $\delta$ in ordinate
Figure 5: Coarsening results on a circle graph of size 30. The original graph is on the left, the subsequent graphs are coarsenings of different sizes.

Theorems & Definitions (63)

remark 1: Nuance in definition
remark 2: A note on symbols
definition 1
remark 3: Optimal Markovian couplings exist
proposition 1: A $d_{\glssymbol{dWL}} ^{ (k)}$-based lower bound
proposition 2: $d_{\glssymbol{dOTC}}$ is an upper bound
proposition 3: Zero-sets
definition 2
remark 4: $k=\infty$
remark 5: $\delta=0$
...and 53 more

Distances for Markov Chains, and Their Differentiation

TL;DR

Abstract

Distances for Markov Chains, and Their Differentiation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (63)