Flow updates for domain decomposition of entropic optimal transport
Ismael Medina, Bernhard Schmitzer
TL;DR
Addresses freezing in domain-decomposition methods for entropic OT by introducing flow updates, an $L^∞$-style variant of the AHT flow, that can be combined with domain decomposition to guarantee convergence to the global minimizer. The paper provides a formal interpretation of flow updates, a convergence analysis for the hybrid scheme, and a thorough GPU-enabled numerical evaluation showing that flow updates mitigate nonlocal curl while multiscale DD often outperforms the hybrid approach. It also discusses a detailed GPU implementation and compares single-scale, hybrid, and multiscale approaches. The results indicate that flow updates are effective when a good initial coupling exists, and that multiscale domain decomposition remains the fastest general solution method, with the GPU-enabled DD offering scalable performance for large-scale entropic OT problems.
Abstract
Domain decomposition has been shown to be a computationally efficient distributed method for solving large scale entropic optimal transport problems. However, a naive implementation of the algorithm can freeze in the limit of very fine partition cells (i.e. it asymptotically becomes stationary and does not find the global minimizer), since information can only travel slowly between cells. In practice this can be avoided by a coarse-to-fine multiscale scheme. In this article we introduce flow updates as an alternative approach. Flow updates can be interpreted as a variant of the celebrated algorithm by Angenent, Haker, and Tannenbaum, and can be combined canonically with domain decomposition. We prove convergence to the global minimizer and provide a formal discussion of its continuity limit. We give a numerical comparison with naive and multiscale domain decomposition, and show that the flow updates prevent freezing in the regime of very many cells. While the multiscale scheme is observed to be faster than the hybrid approach in general, the latter could be a viable alternative in cases where a good initial coupling is available. Our numerical experiments are based on a novel GPU implementation of domain decomposition that we describe in the appendix.
