Table of Contents
Fetching ...

Analysis of Kernel Mirror Prox for Measure Optimization

Pavel Dvurechensky, Jia-Jie Zhu

TL;DR

This work addresses optimization over probability measures in high-dimensional ML settings by introducing Mixed Kernel Nash Equilibrium (MKNE), a specialization of Mixed Functional Nash Equilibrium (MFNE) where the dual function space is an RKHS. It models the continuous-time dynamics via interacting Fisher-Rao and RKHS gradient flows and derives a primal-dual Kernel Mirror Prox (KMP) algorithm to solve MKNE in infinite dimensions, with concrete convergence guarantees. The authors apply this framework to Distributionally Robust Optimization (DRO) with kernel MMD constraints, obtaining non-asymptotic suboptimality guarantees under non-convex losses in the data space and providing robustness and distribution-shift guarantees. Overall, the paper provides a principled, unified approach to kernel-based measure optimization with rigorous convergence results and practical DRO implications.

Abstract

By choosing a suitable function space as the dual to the non-negative measure cone, we study in a unified framework a class of functional saddle-point optimization problems, which we term the Mixed Functional Nash Equilibrium (MFNE), that underlies several existing machine learning algorithms, such as implicit generative models, distributionally robust optimization (DRO), and Wasserstein barycenters. We model the saddle-point optimization dynamics as an interacting Fisher-Rao-RKHS gradient flow when the function space is chosen as a reproducing kernel Hilbert space (RKHS). As a discrete time counterpart, we propose a primal-dual kernel mirror prox (KMP) algorithm, which uses a dual step in the RKHS, and a primal entropic mirror prox step. We then provide a unified convergence analysis of KMP in an infinite-dimensional setting for this class of MFNE problems, which establishes a convergence rate of $O(1/N)$ in the deterministic case and $O(1/\sqrt{N})$ in the stochastic case, where $N$ is the iteration counter. As a case study, we apply our analysis to DRO, providing algorithmic guarantees for DRO robustness and convergence.

Analysis of Kernel Mirror Prox for Measure Optimization

TL;DR

This work addresses optimization over probability measures in high-dimensional ML settings by introducing Mixed Kernel Nash Equilibrium (MKNE), a specialization of Mixed Functional Nash Equilibrium (MFNE) where the dual function space is an RKHS. It models the continuous-time dynamics via interacting Fisher-Rao and RKHS gradient flows and derives a primal-dual Kernel Mirror Prox (KMP) algorithm to solve MKNE in infinite dimensions, with concrete convergence guarantees. The authors apply this framework to Distributionally Robust Optimization (DRO) with kernel MMD constraints, obtaining non-asymptotic suboptimality guarantees under non-convex losses in the data space and providing robustness and distribution-shift guarantees. Overall, the paper provides a principled, unified approach to kernel-based measure optimization with rigorous convergence results and practical DRO implications.

Abstract

By choosing a suitable function space as the dual to the non-negative measure cone, we study in a unified framework a class of functional saddle-point optimization problems, which we term the Mixed Functional Nash Equilibrium (MFNE), that underlies several existing machine learning algorithms, such as implicit generative models, distributionally robust optimization (DRO), and Wasserstein barycenters. We model the saddle-point optimization dynamics as an interacting Fisher-Rao-RKHS gradient flow when the function space is chosen as a reproducing kernel Hilbert space (RKHS). As a discrete time counterpart, we propose a primal-dual kernel mirror prox (KMP) algorithm, which uses a dual step in the RKHS, and a primal entropic mirror prox step. We then provide a unified convergence analysis of KMP in an infinite-dimensional setting for this class of MFNE problems, which establishes a convergence rate of in the deterministic case and in the stochastic case, where is the iteration counter. As a case study, we apply our analysis to DRO, providing algorithmic guarantees for DRO robustness and convergence.
Paper Structure (25 sections, 18 theorems, 126 equations, 2 algorithms)

This paper contains 25 sections, 18 theorems, 126 equations, 2 algorithms.

Key Result

Lemma 2.1

Suppose the probability metric is chosen to be the optimal transport metric (e.g., $p$-Wasserstein distance). Then, the DRO problem eq-dro-ipm admits the reformulation $\Psi_c$ is the set of $c$-concave functions santambrogio_optimal_2015 and $f^c(y):=\inf_x c(x,y)-f(x)$ denotes the $c$-transform. Suppose the probability metric $\mathcal{D}$ is the MMD, then the DRO problem eq-dro-ipm is equivale

Theorems & Definitions (31)

  • Lemma 2.1: Primal-dual reformulation of Wasserstein and Kernel DRO
  • Example 3.1
  • Example 3.2
  • Lemma 4.3
  • Lemma 4.4: hsieh2019finding
  • Theorem 4.5
  • Remark 4.6: Implementation of mirror descent steps
  • Theorem 4.8
  • Remark 5.1
  • Proposition 5.2: DRO Guarantee for KMP decision sub-optimality
  • ...and 21 more