Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games
Leello Dadi, Ioannis Panageas, Stratis Skoulakis, Luca Viano, Volkan Cevher
TL;DR
This work addresses the problem of achieving convergence to Nash equilibrium in congestion games under bandit feedback while guaranteeing no-regret for each agent. It introduces Bandit Gradient Descent with Caratheodory Exploration (BGD-CE), which combines Online Gradient Descent with Carathéodory decompositions and a $\mu$-bounded-away polytope projection to ensure low-variance cost estimators and sufficient exploration. The main results show a sublinear regret bound $\tilde{O}(m^{5.5}c_{max}^2T^{4/5})$ and, if all agents adopt BG-CE, convergence to an $\epsilon$-Nash Equilibrium in polynomial time with respect to $n$, $m$, and $1/\epsilon$; for network congestion games on DAGs, the algorithm runs in polynomial time. This provides the first affirmative answer to the open question of bandit no-regret dynamics converging to NE and extends semi-bandit results to the pure bandit setting, with practical implications for decentralized routing and resource allocation in networks.
Abstract
We introduce an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an $ε$-approximate Nash Equilibrium in a polynomial number of rounds with respect to $1/ε$, the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting it. As a result, our work answers an open question from arXiv:2206.01880 and extends the recent results of arXiv:2306.15543 to the bandit feedback model. We additionally establish that our online learning algorithm can be implemented in polynomial time for the important special case of Network Congestion Games on Directed Acyclic Graphs (DAG) by constructing an exact $1$-barycentric spanner for DAGs.
