Offline congestion games: How feedback type affects data coverage requirement

Haozhe Jiang; Qiwen Cui; Zhihan Xiong; Maryam Fazel; Simon S. Du

Offline congestion games: How feedback type affects data coverage requirement

Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

TL;DR

This work initiates the study of offline learning in congestion games by analyzing how different reward-feedback modalities affect the data coverage required to recover an approximate Nash equilibrium. It introduces tailored data-coverage notions for facility-level, agent-level, and game-level feedback and provides a unified surrogate-minimization approach with feedback-specific bonuses, achieving polynomial sample complexities under each setting. The paper also proves separations between feedback types via hard instances and demonstrates how to leverage linear-bandit reductions to enable efficient offline learning for agent- and game-level feedback, including strong covariance domination conditions. Overall, the results establish the first formal differences in learnability across feedback types in offline congestion games and show how problem structure can mitigate the curse of exponential action spaces.

Abstract

This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.

Offline congestion games: How feedback type affects data coverage requirement

TL;DR

Abstract

Paper Structure (21 sections, 21 theorems, 53 equations, 3 figures, 9 tables, 1 algorithm)

This paper contains 21 sections, 21 theorems, 53 equations, 3 figures, 9 tables, 1 algorithm.

Introduction
Main Contributions
Motivating Examples
Related Work
Potential Games and Congestion Games.
Offline Bandits and Reinforcement Learning.
Preliminary
Congestion Game
Offline Matrix Game
Offline Congestion Game with Facility-level Feedback
Offline Congestion Game with Agent-level Feedback
Impossibility Result
Solution via Linear Bandit
Offline Congestion Game with Game-Level Feedback
Conclusion
...and 6 more sections

Key Result

Theorem 1

Let $\Pi$ be the set of all deterministic policies, and let $b$ be a bonus term for $\widehat{r}$. With probability $1-\delta$, it holds that where $\pi^\text{output}$ is the output of Algorithm alg:surrogage.

Figures (3)

Figure 1: Illustration of Assumption \ref{['assump:facility']}. There are five facilities and five players with full action space. The facility configuration in $\pi^*$ is marked in red. The transparent boxes cover the facility configuration required in the assumption.
Figure 2: Facility coverage condition for $\rho$. Each pair $(f,n)$ represents the configuration that $n$ players select facility $f$. Each box contains the facility coverage condition for one player. There are two classes of covered actions as described in formula (\ref{['eq:agent_counter']}). The color of each box represents the class of actions it belongs to.
Figure 3: Facility coverage condition for $\rho$. Similar to Figure \ref{['fig:agent_counter']}.

Theorems & Definitions (46)

Example 1: Facility-level feedback
Example 2: Agent-level feedback
Example 3: Game-level feedback
Definition 1
Definition 2
Theorem 1
Definition 3
Definition 4
Theorem 2
proof
...and 36 more

Offline congestion games: How feedback type affects data coverage requirement

TL;DR

Abstract

Offline congestion games: How feedback type affects data coverage requirement

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (46)