Learning to Control Unknown Strongly Monotone Games
Siddharth Chandak, Ilai Bistritz, Nicholas Bambos
TL;DR
This work tackles the problem of steering the unique Nash equilibrium of an unknown, strongly monotone game toward satisfying linear constraints without disclosing players' reward structures. It introduces a two-time-scale online algorithm where the manager adjusts control inputs based on constraint-violation feedback while players update their actions using gradient-based learning in the presence of noise. The authors prove almost-sure convergence to the set of constraint-satisfying equilibria and establish a finite-time mean-square convergence rate near $t^{-1/4}$ under standard step-size conditions. The proposed method preserves user privacy and is applicable to applications like quadratic global objectives and weighted resource allocation, offering a principled, scalable mechanism for efficient equilibria in large networks.
Abstract
Consider a game where the players' utility functions include a reward function and a linear term for each dimension, with coefficients that are controlled by the manager. We assume that the game is strongly monotone, so gradient play converges to a unique Nash equilibrium (NE). The NE is typically globally inefficient. The global performance at NE can be improved by imposing linear constraints on the NE. We therefore want the manager to pick the controlled coefficients that impose the desired constraint on the NE. However, this requires knowing the players' reward functions and action sets. Obtaining this game information is infeasible in a large-scale network and violates user privacy. To overcome this, we propose a simple algorithm that learns to shift the NE to meet the linear constraints by adjusting the controlled coefficients online. Our algorithm only requires the linear constraints violation as feedback and does not need to know the reward functions or the action sets. We prove that our algorithm converges with probability 1 to the set of NE that satisfy target linear constraints. We then prove an L2 convergence rate of near-$O(t^{-1/4})$.
