Distances for Markov Chains, and Their Differentiation
Tristan Brugère, Zhengchao Wan, Yusu Wang
TL;DR
The paper introduces Optimal Transport Markov (OTM) distances to unify graph comparison approaches by treating graphs as Markov chains, showing the WL and OTC distances arise as extremes within this framework. It then defines the delta-discounted WL distance, a one-parameter, regularized instance that admits a fixed-point computation and differentiability via entropy-regularization, while preserving discriminative power. The work establishes precise theoretical connections: WL, OTC, and OTM belong along a spectrum with $d_{ ext{dWL}}^{(k)} = d_{ ext{dOTM}}^{p^{(k)}}$ and $d_{ ext{dWL}}^{(oldsymbol{inite/oldsymbol{ extinfty})}}$ converging to OTC as the discount vanishes; it also provides convergence rates and a practical Sinkhorn-based gradient. Empirically, the delta-discounted WL distance achieves competitive performance in graph classification and yields meaningful graph barycenters, albeit with higher computational cost than FGW, and the authors release a Python library to enable broad usage and further development.
Abstract
(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed in (O'Connor et al., 2022) and the WL distance in (Chen et al., 2022). Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, our new discounted WL distance can be differentiated after a entropy-regularization similar to the Sinkhorn distance, making it suitable to use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.
