Analysis of Kernel Mirror Prox for Measure Optimization
Pavel Dvurechensky, Jia-Jie Zhu
TL;DR
This work addresses optimization over probability measures in high-dimensional ML settings by introducing Mixed Kernel Nash Equilibrium (MKNE), a specialization of Mixed Functional Nash Equilibrium (MFNE) where the dual function space is an RKHS. It models the continuous-time dynamics via interacting Fisher-Rao and RKHS gradient flows and derives a primal-dual Kernel Mirror Prox (KMP) algorithm to solve MKNE in infinite dimensions, with concrete convergence guarantees. The authors apply this framework to Distributionally Robust Optimization (DRO) with kernel MMD constraints, obtaining non-asymptotic suboptimality guarantees under non-convex losses in the data space and providing robustness and distribution-shift guarantees. Overall, the paper provides a principled, unified approach to kernel-based measure optimization with rigorous convergence results and practical DRO implications.
Abstract
By choosing a suitable function space as the dual to the non-negative measure cone, we study in a unified framework a class of functional saddle-point optimization problems, which we term the Mixed Functional Nash Equilibrium (MFNE), that underlies several existing machine learning algorithms, such as implicit generative models, distributionally robust optimization (DRO), and Wasserstein barycenters. We model the saddle-point optimization dynamics as an interacting Fisher-Rao-RKHS gradient flow when the function space is chosen as a reproducing kernel Hilbert space (RKHS). As a discrete time counterpart, we propose a primal-dual kernel mirror prox (KMP) algorithm, which uses a dual step in the RKHS, and a primal entropic mirror prox step. We then provide a unified convergence analysis of KMP in an infinite-dimensional setting for this class of MFNE problems, which establishes a convergence rate of $O(1/N)$ in the deterministic case and $O(1/\sqrt{N})$ in the stochastic case, where $N$ is the iteration counter. As a case study, we apply our analysis to DRO, providing algorithmic guarantees for DRO robustness and convergence.
