Table of Contents
Fetching ...

Deviate or Not: Learning Coalition Structures with Multiple-bit Observations in Games

Yixuan Even Xu, Zhe Feng, Fei Fang

TL;DR

The paper addresses learning hidden coalition structures among $n$ agents by actively designing a sequence of games and observing multi-bit deviation feedback. It introduces a multi-bit observation oracle that yields $n$-bit feedback per round, enabling sublinear-round CSL across normal-form, congestion, graphical, and auction games. It establishes an information-theoretic lower bound of $\log_2 n - O(\log\log n)$ rounds and matches this bound (up to constants) in several settings with tailored algorithms, including $\log_2 n+2$ rounds for normal-form and congestion, and $2n/d+2\log_2 d+1$ for graphical games, as well as $(1+\log_2 n)(1+c)+1$ rounds for auctions. These results significantly reduce the round complexity required to identify coalition structures, enabling practical deployment in real-world multi-agent systems and mechanism design.

Abstract

We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of $n$ agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in $O(\log n)$ rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.

Deviate or Not: Learning Coalition Structures with Multiple-bit Observations in Games

TL;DR

The paper addresses learning hidden coalition structures among agents by actively designing a sequence of games and observing multi-bit deviation feedback. It introduces a multi-bit observation oracle that yields -bit feedback per round, enabling sublinear-round CSL across normal-form, congestion, graphical, and auction games. It establishes an information-theoretic lower bound of rounds and matches this bound (up to constants) in several settings with tailored algorithms, including rounds for normal-form and congestion, and for graphical games, as well as rounds for auctions. These results significantly reduce the round complexity required to identify coalition structures, enabling practical deployment in real-world multi-agent systems and mechanism design.

Abstract

We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.

Paper Structure

This paper contains 15 sections, 17 theorems, 2 equations, 5 figures, 1 table, 5 algorithms.

Key Result

Theorem 2.1

Any algorithm that solves the Multiple-bit CSL problem requires at least $\log_2 n-O(\log\log n)$ rounds of interactions with the agents in the worst case.

Figures (5)

  • Figure 1: The payoff table of agents $i$ and $j$ in the directed prisoner's dilemma $\mathcal{P}(i,j)$. Agent $i$ is the row player and agent $j$ is the column player. The others only have one action and are not shown in the table.
  • Figure 2: Example execution of \ref{['alg:normal_form']} when $\mathcal{S}^* = \{\{1,4\}, \{2, 3\}\}$. The vertices represent the agents, the edges represent the directed prisoner's dilemmas that the algorithm queries each time, and the dashed rectangles represent the sets $T_j$. In the first query, the algorithm queries $\prod_{i\in N,j\in N, i<j}\mathcal{P}(i,j)$ (Line 1) as shown in (a). Using the observations, the algorithm sets $T_1 = T_2 = \varnothing$, $T_3 = \{1, 2\}$, and $T_4 = \{1, 2, 3\}$ (Line 2). In the second query, $T_3$ is bisected into $L_3 = \{1\},R_3 = \{2\}$, $T_4$ is bisected into $L_4 = \{1\},R_4 = \{2,3\}$, and the algorithm queries $\prod_{j\in N,i\in L_j}\mathcal{P}(i,j)$ (Lines 3 to 6) as shown in (b). Using the observations, the algorithm sets $T_3 = L_3 = \{1\}$ and $T_4 = R_4 = \{2,3\}$ (Line 7) as shown in (c). Finally, the algorithm merges $\{1\}$ and $\{4\}$, and $\{2\}$ and $\{3\}$ together to recover the coalition structure (Lines 8 to 11).
  • Figure 3: Example execution of the bitwise search (Lines 1 to 12) in \ref{['alg:auction_sublinear']} when $\mathcal{S}^* = \{\{1,4\},\{2,3\},\{5,6\}\}$. The vertices represent the agents, and the dashed lines represent the sets used in the algorithm. Using the first query, the algorithm identifies one agent in each coalition and groups them as $T_x = \{1, 3, 5\}$. The rest are $T_y = \{2, 4, 6\}$ (Lines 1 to 2). Then, as shown above, for each $b \in \{0,1,2\}$, the algorithm picks the set of agents in $T_x$ with the $b$-th lowest binary bit as $1$ as $X$, and partitions $T_y$ into $T_{\mathrm{True}}$, each of which is cooperating with some agent in $X$, and $T_{\mathrm{False}}$, each of which is not cooperating with any agent in $X$ (Lines 3 to 12).
  • Figure 4: Illustration of the directed Braess's paradox $\mathcal{B}(i,j)$. Agent $i$ needs to choose a path from $S_i$ to $T$ and agent $j$ needs to choose a path from $S_j$ to $T$. The cost functions are indicated on the edges.
  • Figure 5: Example execution of Lines 1 to 6 in \ref{['alg:graphical']} when $n = 6$ and $size = 2$. The vertices represent the agents and the edges represent the directed prisoner's dilemmas that the algorithm queries for each $\delta = \{0,1,2\}$. The algorithm first partitions the agents into $cnt = \lceil\frac{n}{size}\rceil = 3$ blocks, each containing at most $size = 2$ agents (Lines 1 to 2). Then, for each $\delta = \{0,1,2\}$, the algorithm queries the oracle for the directed prisoner's dilemmas that correspond to the edges shown in the figure (Lines 3 to 4). Using the observations, the algorithm then determines for each agent $j$, $\delta_j = belong_j - belong_i$, where $i$ is $j$'s predecessor in $[j]_{\mathcal{S}^*}$ (Lines 5 to 6). Note that in each query, any agent $j$ is involved in at most $2size\leq d$ games.

Theorems & Definitions (40)

  • Theorem 2.1
  • proof
  • Definition 3.1
  • Definition 3.2
  • Lemma 3.1
  • proof
  • Definition 3.3: xu2023learning
  • Lemma 3.2
  • proof
  • Theorem 3.1
  • ...and 30 more