Table of Contents
Fetching ...

Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas

TL;DR

This work develops an optimal-transport framework for comparing stochastic processes by showing that probabilistic bisimulation metrics are equivalent to OT distances between Markov chains. It recasts the problem as a finite-dimensional LP over occupancy couplings and introduces entropy-regularized solvers, notably Sinkhorn Value Iteration (SVI) and Sinkhorn Policy Iteration (SPI), with convergence guarantees. The authors prove contraction properties for the Bellman--Sinkhorn operators and provide finite-iteration error bounds, complemented by experiments demonstrating rapid convergence and superior efficiency relative to prior approaches. The framework enables scalable, cross-space distance-based representations for reinforcement learning and opens pathways to data-driven learning via the presented LP formulation.

Abstract

We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation allows us to port several algorithmic ideas from other areas of optimal transport theory. In particular, our formulation makes it possible to introduce an appropriate notion of entropy regularization into the optimization problem, which in turn enables us to directly calculate optimal transport distances via a Sinkhorn-like method we call Sinkhorn Value Iteration (SVI). We show both theoretically and empirically that this method converges quickly to an optimal coupling, essentially at the same computational cost of running vanilla Sinkhorn in each pair of states. Along the way, we point out that our optimal transport distance exactly matches the common notion of bisimulation metrics between Markov chains, and thus our results also apply to computing such metrics, and in fact our algorithm turns out to be significantly more efficient than the best known methods developed so far for this purpose.

Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

TL;DR

This work develops an optimal-transport framework for comparing stochastic processes by showing that probabilistic bisimulation metrics are equivalent to OT distances between Markov chains. It recasts the problem as a finite-dimensional LP over occupancy couplings and introduces entropy-regularized solvers, notably Sinkhorn Value Iteration (SVI) and Sinkhorn Policy Iteration (SPI), with convergence guarantees. The authors prove contraction properties for the Bellman--Sinkhorn operators and provide finite-iteration error bounds, complemented by experiments demonstrating rapid convergence and superior efficiency relative to prior approaches. The framework enables scalable, cross-space distance-based representations for reinforcement learning and opens pathways to data-driven learning via the presented LP formulation.

Abstract

We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation allows us to port several algorithmic ideas from other areas of optimal transport theory. In particular, our formulation makes it possible to introduce an appropriate notion of entropy regularization into the optimization problem, which in turn enables us to directly calculate optimal transport distances via a Sinkhorn-like method we call Sinkhorn Value Iteration (SVI). We show both theoretically and empirically that this method converges quickly to an optimal coupling, essentially at the same computational cost of running vanilla Sinkhorn in each pair of states. Along the way, we point out that our optimal transport distance exactly matches the common notion of bisimulation metrics between Markov chains, and thus our results also apply to computing such metrics, and in fact our algorithm turns out to be significantly more efficient than the best known methods developed so far for this purpose.
Paper Structure (38 sections, 15 theorems, 93 equations, 8 figures, 3 algorithms)

This paper contains 38 sections, 15 theorems, 93 equations, 8 figures, 3 algorithms.

Key Result

Lemma 1

A distribution $\mu\in\Delta_{\mathcal{X}\mathcal{Y}\times\mathcal{X}\mathcal{Y}}$ is a valid occupancy coupling associated with some transition coupling $\pi:\mathcal{X}\mathcal{Y}\rightarrow\Delta_{\mathcal{X}\mathcal{Y}}$ if and only if it satisfies Equations eq:const_flow--eq:const_Y.

Figures (8)

  • Figure 1: Estimated transport cost as a function $k$, for various choices of $m$ and $\eta = 1$.
  • Figure 2: Visual representation of the distances computed between the chains $M_\mathcal{X}$ and $M_\mathcal{Y}$.
  • Figure 3: Error of estimated transport cost as a function $k$, for various choices of $\eta$.
  • Figure 4: Comparison of the computational time of the different methods proposed to obtain a near-optimal solution for different values of $\gamma$. For each Markov chain size, the results obtained in 5 randomly generated instances are compared, showing the standard deviation in the plot. Data is displayed on a log-log scale.
  • Figure 5: Result of applying MDS to the pairwise distances between the set of 4-room instances studied. On the plot, the first two coordinates of the MDS embedding are used as the spatial coordinates, and the third coordinate is encoded via the color bar provided on the left-hand side of the axes. It can be observed how the elements in the same cluster present common features that differentiate them from those in another cluster. In the examples shown in the figure we can see how the instances in which the closest reward involves crossing a door are concentrated in one cluster, while the instances in which the reward and the initial state are located in the same room belong to a different cluster. The remaining clusters correspond to having to cross two doors for a reward (set of green points on the top), or having no reward that is accessible from the initial state (set of blue points in the middle, with large negative $z$-coordinates).
  • ...and 3 more figures

Theorems & Definitions (26)

  • Lemma 1
  • Theorem 1
  • Proposition 1
  • Theorem 2
  • Theorem 3: cf. Theorem 1 in moulos2021bicausal
  • proof
  • proof : Proof of Lemma \ref{['lem:occupancy_validity']}
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 16 more