Table of Contents
Fetching ...

Flow updates for domain decomposition of entropic optimal transport

Ismael Medina, Bernhard Schmitzer

TL;DR

Addresses freezing in domain-decomposition methods for entropic OT by introducing flow updates, an $L^∞$-style variant of the AHT flow, that can be combined with domain decomposition to guarantee convergence to the global minimizer. The paper provides a formal interpretation of flow updates, a convergence analysis for the hybrid scheme, and a thorough GPU-enabled numerical evaluation showing that flow updates mitigate nonlocal curl while multiscale DD often outperforms the hybrid approach. It also discusses a detailed GPU implementation and compares single-scale, hybrid, and multiscale approaches. The results indicate that flow updates are effective when a good initial coupling exists, and that multiscale domain decomposition remains the fastest general solution method, with the GPU-enabled DD offering scalable performance for large-scale entropic OT problems.

Abstract

Domain decomposition has been shown to be a computationally efficient distributed method for solving large scale entropic optimal transport problems. However, a naive implementation of the algorithm can freeze in the limit of very fine partition cells (i.e. it asymptotically becomes stationary and does not find the global minimizer), since information can only travel slowly between cells. In practice this can be avoided by a coarse-to-fine multiscale scheme. In this article we introduce flow updates as an alternative approach. Flow updates can be interpreted as a variant of the celebrated algorithm by Angenent, Haker, and Tannenbaum, and can be combined canonically with domain decomposition. We prove convergence to the global minimizer and provide a formal discussion of its continuity limit. We give a numerical comparison with naive and multiscale domain decomposition, and show that the flow updates prevent freezing in the regime of very many cells. While the multiscale scheme is observed to be faster than the hybrid approach in general, the latter could be a viable alternative in cases where a good initial coupling is available. Our numerical experiments are based on a novel GPU implementation of domain decomposition that we describe in the appendix.

Flow updates for domain decomposition of entropic optimal transport

TL;DR

Addresses freezing in domain-decomposition methods for entropic OT by introducing flow updates, an -style variant of the AHT flow, that can be combined with domain decomposition to guarantee convergence to the global minimizer. The paper provides a formal interpretation of flow updates, a convergence analysis for the hybrid scheme, and a thorough GPU-enabled numerical evaluation showing that flow updates mitigate nonlocal curl while multiscale DD often outperforms the hybrid approach. It also discusses a detailed GPU implementation and compares single-scale, hybrid, and multiscale approaches. The results indicate that flow updates are effective when a good initial coupling exists, and that multiscale domain decomposition remains the fastest general solution method, with the GPU-enabled DD offering scalable performance for large-scale entropic OT problems.

Abstract

Domain decomposition has been shown to be a computationally efficient distributed method for solving large scale entropic optimal transport problems. However, a naive implementation of the algorithm can freeze in the limit of very fine partition cells (i.e. it asymptotically becomes stationary and does not find the global minimizer), since information can only travel slowly between cells. In practice this can be avoided by a coarse-to-fine multiscale scheme. In this article we introduce flow updates as an alternative approach. Flow updates can be interpreted as a variant of the celebrated algorithm by Angenent, Haker, and Tannenbaum, and can be combined canonically with domain decomposition. We prove convergence to the global minimizer and provide a formal discussion of its continuity limit. We give a numerical comparison with naive and multiscale domain decomposition, and show that the flow updates prevent freezing in the regime of very many cells. While the multiscale scheme is observed to be faster than the hybrid approach in general, the latter could be a viable alternative in cases where a good initial coupling is available. Our numerical experiments are based on a novel GPU implementation of domain decomposition that we describe in the appendix.
Paper Structure (54 sections, 7 theorems, 44 equations, 15 figures, 1 table)

This paper contains 54 sections, 7 theorems, 44 equations, 15 figures, 1 table.

Key Result

Lemma 2.1

Let $(\rho_n)_n \subset \mathcal{M}_+(Z)$ converging weak* to $\rho$. Let $A\subset Z$ closed, and assume that $\rho(A) = \lim_{n\to\infty} \rho_n(A)$. Then $\rho{\hbox{\LARGE$\llcorner$}} A = \lim_{n\rightarrow\infty}\rho_n {\hbox{\LARGE$\llcorner$}} A$.

Figures (15)

  • Figure 1: An example of the freezing behaviour in domain decomposition. For $\mu = \mathcal{L} {\hbox{\LARGE$\llcorner$}} [-1/2,+1/2]^2$, $\nu = \tfrac{1}{2} [\delta_{(-1/2, 0)} + \delta_{(+1/2, 0)}]$, we visualize the couplings $\pi^{n,k}$ by colouring the domain decomposition cells that are mapped to $(-1/2, 0)$ in yellow and those mapped to $(+1/2, 0)$ in dark blue; non-deterministic assignments feature an intermediate color. We show two domain decomposition trajectories at different resolutions, for an initialization that features a global rotation with respect to the optimal configuration, in this case corresponding to a vertical interface. Both trajectories evolve towards the optimal coupling, but the convergence rate deteriorates drastically as the resolution increases.
  • Figure 2: Hybrid scheme overcomes freezing: The top row shows the same sequence of iterates as Figure \ref{['fig:freezing']}, now from $t = 0$ until $t = 1$, showcasing the freezing. The center row shows the corresponding trajectory for the hybrid scheme iterates: after the same number of iterations the iterates get much closer to the global optimizer, leaving only small local perturbations of the optimal assignment that can quickly be resolved by a few additional domain decomposition iterations. Finally, the bottom row shows a higher resolution example, converging in approximately the same time as the one with lower resolution, demonstrating the resilience to freezing.
  • Figure 3: Prototypical choice of basic and composite partitions for $X = [0,1]^2$. For a resolution scale $n \in \mathbb{N}$, $X$ is first partitioned into basic cells of size $1/n \times 1/n$. The restriction of $\mu$ to every basic cell is approximated by a Dirac delta in the center of the cell (indicated by the dots). The $A$ and $B$ partitions are constructed by $2\times 2$ groups of basic cells, with an offset between $A$ and $B$ groups.
  • Figure 4: Left: Marginal $\mu=\nu$ for the experiments of Section \ref{['sec:single-scale-experiments']}. Center: Partition of $X$ into basic cells with $n = 16$ cells along each axis. Some basic cells are highlighted in color. Right: To visualize a transport plan $\pi \in \Pi(\mu,\nu)$, for each basic cell $X_i$ highlighted in color in the center panel, we show the $\nu$-density of the corresponding $Y$-marginal $\nu_i=\textnormal{P}_Y( \pi {\hbox{\LARGE$\llcorner$}} (X_i\times Y))$, cf. \ref{['eq:basic-cell-Y-marginal']}, in the same color on $Y$. Here this is exemplarily shown for the initialization $\pi^0=(\mathop{\mathrm{id}}\nolimits,T_0)_\# \mu$ where $T_0$ is a rotation by angle $\pi/2$, \ref{['eq:T0']}.
  • Figure 5: Domain decomposition and hybrid trajectories for $s = 2$, $\varepsilon_\gamma=(\Delta x / 2)^2$. Domain decomposition alone clearly exhibits freezing. On the other hand, the hybrid scheme achieves n near-optimal configuration at approximately the speed for $N = 32$ and $N = 128$.
  • ...and 10 more figures

Theorems & Definitions (24)

  • Lemma 2.1: Continuity of the restriction operator, Proposition 8.4.4 in bogachev-measure-theory
  • Proposition 2.2: Optimal entropic transport couplings
  • Definition 2.3: Basic and composite partitions
  • Definition 3.1
  • Definition 3.2
  • Definition 3.3: Edge candidates
  • Definition 3.4: Flow update
  • Remark 3.5
  • Remark 3.6
  • Lemma 3.7
  • ...and 14 more