Analysis of Kernel Mirror Prox for Measure Optimization

Pavel Dvurechensky; Jia-Jie Zhu

Analysis of Kernel Mirror Prox for Measure Optimization

Pavel Dvurechensky, Jia-Jie Zhu

TL;DR

This work addresses optimization over probability measures in high-dimensional ML settings by introducing Mixed Kernel Nash Equilibrium (MKNE), a specialization of Mixed Functional Nash Equilibrium (MFNE) where the dual function space is an RKHS. It models the continuous-time dynamics via interacting Fisher-Rao and RKHS gradient flows and derives a primal-dual Kernel Mirror Prox (KMP) algorithm to solve MKNE in infinite dimensions, with concrete convergence guarantees. The authors apply this framework to Distributionally Robust Optimization (DRO) with kernel MMD constraints, obtaining non-asymptotic suboptimality guarantees under non-convex losses in the data space and providing robustness and distribution-shift guarantees. Overall, the paper provides a principled, unified approach to kernel-based measure optimization with rigorous convergence results and practical DRO implications.

Abstract

By choosing a suitable function space as the dual to the non-negative measure cone, we study in a unified framework a class of functional saddle-point optimization problems, which we term the Mixed Functional Nash Equilibrium (MFNE), that underlies several existing machine learning algorithms, such as implicit generative models, distributionally robust optimization (DRO), and Wasserstein barycenters. We model the saddle-point optimization dynamics as an interacting Fisher-Rao-RKHS gradient flow when the function space is chosen as a reproducing kernel Hilbert space (RKHS). As a discrete time counterpart, we propose a primal-dual kernel mirror prox (KMP) algorithm, which uses a dual step in the RKHS, and a primal entropic mirror prox step. We then provide a unified convergence analysis of KMP in an infinite-dimensional setting for this class of MFNE problems, which establishes a convergence rate of $O(1/N)$ in the deterministic case and $O(1/\sqrt{N})$ in the stochastic case, where $N$ is the iteration counter. As a case study, we apply our analysis to DRO, providing algorithmic guarantees for DRO robustness and convergence.

Analysis of Kernel Mirror Prox for Measure Optimization

TL;DR

Abstract

in the deterministic case and

in the stochastic case, where

is the iteration counter. As a case study, we apply our analysis to DRO, providing algorithmic guarantees for DRO robustness and convergence.

Paper Structure (25 sections, 18 theorems, 126 equations, 2 algorithms)

This paper contains 25 sections, 18 theorems, 126 equations, 2 algorithms.

INTRODUCTION
PRELIMINARIES
Duality of Metrics on Probability Measures
Gradient Flow in Hilbert and Metric Spaces
RKHS GRADIENT FLOW FOR MODELING MEASURE OPTIMIZATION DYNAMICS
A PRIMAL-DUAL KERNEL MIRROR PROX ALGORITHM
Preliminaries
Kernel Mirror Prox Algorithm and Its Analysis
Analysis of Stochastic Kernel Mirror Prox
DRO Algorithmic Guarantees using Kernel Mirror Prox
DISCUSSION
FURTHER TECHNICAL BACKGROUND
List of Acronyms
Proof of Lemma \ref{['thm-dro-reform-lemma']}
Background of Gradient Flow and Geodesic Convexity
...and 10 more sections

Key Result

Lemma 2.1

Suppose the probability metric is chosen to be the optimal transport metric (e.g., $p$-Wasserstein distance). Then, the DRO problem eq-dro-ipm admits the reformulation $\Psi_c$ is the set of $c$-concave functions santambrogio_optimal_2015 and $f^c(y):=\inf_x c(x,y)-f(x)$ denotes the $c$-transform. Suppose the probability metric $\mathcal{D}$ is the MMD, then the DRO problem eq-dro-ipm is equivale

Theorems & Definitions (31)

Lemma 2.1: Primal-dual reformulation of Wasserstein and Kernel DRO
Example 3.1
Example 3.2
Lemma 4.3
Lemma 4.4: hsieh2019finding
Theorem 4.5
Remark 4.6: Implementation of mirror descent steps
Theorem 4.8
Remark 5.1
Proposition 5.2: DRO Guarantee for KMP decision sub-optimality
...and 21 more

Analysis of Kernel Mirror Prox for Measure Optimization

TL;DR

Abstract

Analysis of Kernel Mirror Prox for Measure Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (31)