Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu; Mateusz B. Majka

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz B. Majka

TL;DR

The paper investigates entropy-regularized optimization over probability measures in the Wasserstein space, introducing and analyzing the Best Response flow to handle non-convex (and non-concave) objectives. A central result is a contractivity condition that ties the non-convexity of the objective, the regularization parameter, and the reference measure tail to ensure that the Best Response operator is a contraction in the $L^1$-Wasserstein distance, guaranteeing a unique global minimizer and exponential convergence of the BR flow. This framework is extended from single-agent optimization to non-convex-non-concave min–max problems, establishing existence and exponential convergence to a unique global mixed Nash equilibrium under high regularization, with the flexibility of different regularizers and learning rates per player. The theory is connected to reinforcement learning and multi-agent settings, including policy optimization for softmax-parametrized mean-field policies in MDPs and two-player Markov games, offering a principled alternative to Wasserstein gradient flows under weaker regularity assumptions and providing practical numerical schemes for BR implementation.

Abstract

We study the problem of minimizing non-convex functionals on the space of probability measures, regularized by the relative entropy (KL divergence) with respect to a fixed reference measure, as well as the corresponding problem of solving entropy-regularized non-convex-non-concave min-max problems. We utilize the Best Response flow (also known in the literature as the fictitious play flow) and study how its convergence is influenced by the relation between the degree of non-convexity of the functional under consideration, the regularization parameter and the tail behaviour of the reference measure. In particular, we demonstrate how to choose the regularizer, given the non-convex functional, so that the Best Response operator becomes a contraction with respect to the $L^1$-Wasserstein distance, which ensures the existence of its unique fixed point that is then shown to be the unique global minimizer for our optimization problem. This extends recent results where the Best Response flow was applied to solve convex optimization problems regularized by the relative entropy with respect to arbitrary reference measures, and with arbitrary values of the regularization parameter. Our results explain precisely how the assumption of convexity can be relaxed, at the expense of making a specific choice of the regularizer. Additionally, we demonstrate how these results can be applied in reinforcement learning in the context of policy optimization for Markov Decision Processes and Markov games with softmax parametrized policies in the mean-field regime.

Non-convex entropic mean-field optimization via Best Response flow

TL;DR

-Wasserstein distance, guaranteeing a unique global minimizer and exponential convergence of the BR flow. This framework is extended from single-agent optimization to non-convex-non-concave min–max problems, establishing existence and exponential convergence to a unique global mixed Nash equilibrium under high regularization, with the flexibility of different regularizers and learning rates per player. The theory is connected to reinforcement learning and multi-agent settings, including policy optimization for softmax-parametrized mean-field policies in MDPs and two-player Markov games, offering a principled alternative to Wasserstein gradient flows under weaker regularity assumptions and providing practical numerical schemes for BR implementation.

Abstract

-Wasserstein distance, which ensures the existence of its unique fixed point that is then shown to be the unique global minimizer for our optimization problem. This extends recent results where the Best Response flow was applied to solve convex optimization problems regularized by the relative entropy with respect to arbitrary reference measures, and with arbitrary values of the regularization parameter. Our results explain precisely how the assumption of convexity can be relaxed, at the expense of making a specific choice of the regularizer. Additionally, we demonstrate how these results can be applied in reinforcement learning in the context of policy optimization for Markov Decision Processes and Markov games with softmax parametrized policies in the mean-field regime.

Non-convex entropic mean-field optimization via Best Response flow

TL;DR

Abstract

Non-convex entropic mean-field optimization via Best Response flow

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (39)