Adapted optimal transport between Gaussian processes in discrete time

Madhu Gunasingam; Ting-Kam Leonard Wong

Adapted optimal transport between Gaussian processes in discrete time

Madhu Gunasingam, Ting-Kam Leonard Wong

TL;DR

This work derives an explicit adapted 2-W Wasserstein distance between non-degenerate Gaussian measures in discrete time under time-respecting (bicausal) constraints, and provides a complete characterization of the optimal couplings. The main result expresses AW_2^2 as the sum of the squared distance between means and an adapted Bures-Wasserstein distance between covariances: AW_2^2(mu,nu) = ||a-b||^2 + d_ABW^2(A,B), with d_ABW^2(A,B) = trace(A) + trace(B) - 2 || diag(L^T M) ||_1. The authors develop a dynamic programming principle to reduce the problem to 1D conditional OT problems, identify the sign of diag(L^T M) as guiding the optimal coupling structure, and connect the adapted geometry to Knothe-Rosenblatt transport, including a detailed discussion of when KR is AW_2-optimal. These results illuminate the geometry of time-respecting transport for Gaussian processes and provide tools for applications in filtering, robust optimization, and stochastic control under adapted metrics; extensions to multivariate processes with entropic regularization and continuous-time settings are highlighted as future directions.

Abstract

We derive explicitly the adapted $2$-Wasserstein distance between non-degenerate Gaussian distributions on $\mathbb{R}^N$ and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices.

Adapted optimal transport between Gaussian processes in discrete time

TL;DR

Abstract

We derive explicitly the adapted

-Wasserstein distance between non-degenerate Gaussian distributions on

and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices.

Paper Structure (7 sections, 12 theorems, 49 equations, 2 figures)

This paper contains 7 sections, 12 theorems, 49 equations, 2 figures.

Introduction
Adapted optimal transport
The Knothe-Rosenblatt coupling
Dynamic programming principle
Adapted Wasserstein geometry of Gaussian distributions
Examples
Conclusion

Key Result

Theorem 1.1

Let $\mu = \mathcal{N}(a, A)$ and $\nu = \mathcal{N}(b, B)$ be non-degenerate Gaussian distributions on $\mathbb{R}^N$. Let $A = LL^{\top}$ and $B = MM^{\top}$ be the Cholesky decompositions of $A$ and $B$ respectively. Then the adapted $2$-Wasserstein distance $\mathcal{AW}_2(\mu, \nu)$ is given by where $d_{\mathrm{ABW}}$ is the adapted Bures-Wasserstein distance on the space $\mathscr{S}_{++}(N

Figures (2)

Figure 1: Transports between $\mu = \mathcal{N}(0, A)$ and $\nu = \mathcal{N}(0, B)$ where $A$ and $B$ are given by \ref{['eqn:compare']}. Each distribution is visualized by the elliptical contours of its density. Top row: The $\mathcal{W}_2$-transport. Here the arrows and the grid correspond to the eigenvectors of $A$. The image of the grid under $T_{\mathrm{W}}^{\mu, \nu}$ is shown on the right. Second row: Now the arrows and the grid correspond to the standard Euclidean basis $\{\mathbf{e}_1, \mathbf{e}_2\}$ in the source space. The middle plot shows the transport under $T_{\mathrm{KR}}^{\mu, \nu}$ and the right plot shows the transport under $T_{\mathrm{AW}}^{\mu, \nu}$. Since $T_{\mathrm{KR}}^{\mu, \nu}$ is comonotonic at each time, $T_{\mathrm{KR}}^{\mu, \nu} \mathbf{e}_1$ points towards the right and $T_{\mathrm{KR}}^{\mu, \nu}\mathbf{e}_2$ points upward. Here only $T_{\mathrm{ABW}}^{\mu, \nu}\mathbf{e}_2$ points upward.
Figure 2: Three interpolations $(A_t)_{0 \leq t \leq 1}$ between $A_0 = A$ and $A_1 = B$ as in \ref{['eqn:compare']}. For visualization purposes we also include a uniform translation. The thin (grey) ellipses visualize McCann's displacement interpolation. The thick dashed (red) ones show the Knothe-Rosenblatt interpolation where $T = L_1 L_0^{-1}$. The thick solid (blue) ones show the interpolation with $T = L_1 P L_0^{-1}$; in this case $A_{\frac{1}{2}}$ is degenerate.

Theorems & Definitions (27)

Theorem 1.1
Definition 2.1: Causal and bicausal couplings
Definition 2.2: Adapted Wasserstein distance
Lemma 3.1
Corollary 3.2
Lemma 3.3
proof
Theorem 4.1: Dynamic programming principle (Proposition 5.2 of BBYZ2017
Proposition 4.2: Time consistency
Lemma 4.3: Conditional distribution under Cholesky decomposition
...and 17 more

Adapted optimal transport between Gaussian processes in discrete time

TL;DR

Abstract

Adapted optimal transport between Gaussian processes in discrete time

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (27)