Table of Contents
Fetching ...

Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games

Leello Dadi, Ioannis Panageas, Stratis Skoulakis, Luca Viano, Volkan Cevher

TL;DR

This work addresses the problem of achieving convergence to Nash equilibrium in congestion games under bandit feedback while guaranteeing no-regret for each agent. It introduces Bandit Gradient Descent with Caratheodory Exploration (BGD-CE), which combines Online Gradient Descent with Carathéodory decompositions and a $\mu$-bounded-away polytope projection to ensure low-variance cost estimators and sufficient exploration. The main results show a sublinear regret bound $\tilde{O}(m^{5.5}c_{max}^2T^{4/5})$ and, if all agents adopt BG-CE, convergence to an $\epsilon$-Nash Equilibrium in polynomial time with respect to $n$, $m$, and $1/\epsilon$; for network congestion games on DAGs, the algorithm runs in polynomial time. This provides the first affirmative answer to the open question of bandit no-regret dynamics converging to NE and extends semi-bandit results to the pure bandit setting, with practical implications for decentralized routing and resource allocation in networks.

Abstract

We introduce an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an $ε$-approximate Nash Equilibrium in a polynomial number of rounds with respect to $1/ε$, the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting it. As a result, our work answers an open question from arXiv:2206.01880 and extends the recent results of arXiv:2306.15543 to the bandit feedback model. We additionally establish that our online learning algorithm can be implemented in polynomial time for the important special case of Network Congestion Games on Directed Acyclic Graphs (DAG) by constructing an exact $1$-barycentric spanner for DAGs.

Polynomial Convergence of Bandit No-Regret Dynamics in Congestion Games

TL;DR

This work addresses the problem of achieving convergence to Nash equilibrium in congestion games under bandit feedback while guaranteeing no-regret for each agent. It introduces Bandit Gradient Descent with Caratheodory Exploration (BGD-CE), which combines Online Gradient Descent with Carathéodory decompositions and a -bounded-away polytope projection to ensure low-variance cost estimators and sufficient exploration. The main results show a sublinear regret bound and, if all agents adopt BG-CE, convergence to an -Nash Equilibrium in polynomial time with respect to , , and ; for network congestion games on DAGs, the algorithm runs in polynomial time. This provides the first affirmative answer to the open question of bandit no-regret dynamics converging to NE and extends semi-bandit results to the pure bandit setting, with practical implications for decentralized routing and resource allocation in networks.

Abstract

We introduce an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an -approximate Nash Equilibrium in a polynomial number of rounds with respect to , the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting it. As a result, our work answers an open question from arXiv:2206.01880 and extends the recent results of arXiv:2306.15543 to the bandit feedback model. We additionally establish that our online learning algorithm can be implemented in polynomial time for the important special case of Network Congestion Games on Directed Acyclic Graphs (DAG) by constructing an exact -barycentric spanner for DAGs.
Paper Structure (27 sections, 41 theorems, 139 equations, 1 figure, 1 table, 4 algorithms)

This paper contains 27 sections, 41 theorems, 139 equations, 1 figure, 1 table, 4 algorithms.

Key Result

Theorem 2

There exists a no-regret algorithm, Bandit Gradient Descent with Caratheodory Exploration (BGD-CE) such that for any cost vector sequence $c_1,\ldots,c_T \in [0,c_{\mathrm{max}}]^m$ and $\delta > 0$ with probability $1-\delta$.

Figures (1)

  • Figure 1: Construction of a $1$-spanner for DAGs. We illustrate Algorithm \ref{['alg:basis']} on a simple graph. We can select the three red edges as the non redundant edges. We cover these using 3 paths that will constitute the basis. For edge $s\rightarrow b$, we select $s\rightarrow b\rightarrow d\rightarrow e\rightarrow g\rightarrow t$. For the edge $s\rightarrow c$, we first check if is reachable from edge $s\rightarrow b$, we notice it is not. We then find a path starting from $s$. In this case, we select $s\rightarrow c\rightarrow d\rightarrow e\rightarrow g\rightarrow t$. For edge $e\rightarrow f$ we check if is reachable from the last covered edge (in topological order), we notice it is reachable from edge $s\rightarrow c$ so we select $s\rightarrow c\rightarrow d\rightarrow e\rightarrow f\rightarrow t$. The key idea we use to construct a 1-spanner is to ensure that when we cover edges, we first try to reach them with the previously covered edges going in reverse topological order. This prefix property ensures the 1-spanner property.

Theorems & Definitions (83)

  • Remark 1
  • Definition 1: Nash equilibrium
  • Definition 2: Expected cost
  • Definition 3: Mixed Nash equilibrium
  • Definition 4: Regret
  • Definition 5: No-Regret
  • Theorem 2
  • Theorem 3: Converge to NE
  • Corollary 1
  • Theorem 4
  • ...and 73 more