Table of Contents
Fetching ...

Offline Learning of Nash Stable Coalition Structures with Possibly Overlapping Coalitions

Saar Cohen

TL;DR

A new model of coalition formation with possibly overlapping coalitions under partial information, where selfish agents may be part of multiple coalitions simultaneously and their full preferences are initially unknown is presented.

Abstract

Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences. It is often assumed that coalitions are disjoint and preferences are fully known, which may not hold in practice. In this paper, we thus present a new model of coalition formation with possibly overlapping coalitions under partial information, where selfish agents may be part of multiple coalitions simultaneously and their full preferences are initially unknown. Instead, information about past interactions and associated utility feedback is stored in a fixed offline dataset, and we aim to efficiently infer the agents' preferences from this dataset. We analyze the impact of diverse dataset information constraints by studying two types of utility feedback that can be stored in the dataset: agent- and coalition-level utility feedback. For both feedback models, we identify assumptions under which the dataset covers sufficient information for an offline learning algorithm to infer preferences and use them to recover a partition that is (approximately) Nash stable, in which no agent can improve her utility by unilaterally deviating. Our additional goal is devising algorithms with low sample complexity, requiring only a small dataset to obtain a desired approximation to Nash stability. Under agent-level feedback, we provide a sample-efficient algorithm proven to obtain an approximately Nash stable partition under a sufficient and necessary assumption on the information covered by the dataset. However, under coalition-level feedback, we show that only under a stricter assumption is sufficient for sample-efficient learning. Still, in multiple cases, our algorithms' sample complexity bounds have optimality guarantees up to logarithmic factors. Finally, extensive experiments show that our algorithm converges to a low approximation level to Nash stability across diverse settings.

Offline Learning of Nash Stable Coalition Structures with Possibly Overlapping Coalitions

TL;DR

A new model of coalition formation with possibly overlapping coalitions under partial information, where selfish agents may be part of multiple coalitions simultaneously and their full preferences are initially unknown is presented.

Abstract

Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences. It is often assumed that coalitions are disjoint and preferences are fully known, which may not hold in practice. In this paper, we thus present a new model of coalition formation with possibly overlapping coalitions under partial information, where selfish agents may be part of multiple coalitions simultaneously and their full preferences are initially unknown. Instead, information about past interactions and associated utility feedback is stored in a fixed offline dataset, and we aim to efficiently infer the agents' preferences from this dataset. We analyze the impact of diverse dataset information constraints by studying two types of utility feedback that can be stored in the dataset: agent- and coalition-level utility feedback. For both feedback models, we identify assumptions under which the dataset covers sufficient information for an offline learning algorithm to infer preferences and use them to recover a partition that is (approximately) Nash stable, in which no agent can improve her utility by unilaterally deviating. Our additional goal is devising algorithms with low sample complexity, requiring only a small dataset to obtain a desired approximation to Nash stability. Under agent-level feedback, we provide a sample-efficient algorithm proven to obtain an approximately Nash stable partition under a sufficient and necessary assumption on the information covered by the dataset. However, under coalition-level feedback, we show that only under a stricter assumption is sufficient for sample-efficient learning. Still, in multiple cases, our algorithms' sample complexity bounds have optimality guarantees up to logarithmic factors. Finally, extensive experiments show that our algorithm converges to a low approximation level to Nash stability across diverse settings.
Paper Structure (44 sections, 34 theorems, 71 equations, 7 figures, 2 algorithms)

This paper contains 44 sections, 34 theorems, 71 equations, 7 figures, 2 algorithms.

Key Result

Lemma 1

A symmetric POCF game with unknown, symmetric preferences is a potential game. Therefore, any symmetric POCF game always admits at least one pure (and thus mixed) NS strategy.

Figures (7)

  • Figure 1: Mean approximate duality gap versus the size of datasets generated by $\rho^{\text{rand}}$ (top two rows) and $\rho^{\text{1Rand}}$ (last row) over $5$ runs with different seeds, for varying numbers of agents (left column) and candidate coalitions (right column). Shaded regions indicate standard deviations.
  • Figure 2: Mean approximate duality gap versus the size of datasets generated by $\rho^{\text{rand}}$ over $5$ runs with different seeds, for varying numbers of agents (top two rows) and candidate coalitions (bottom two rows). Shaded regions indicate standard deviations.
  • Figure 3: Mean approximate duality gap versus the size of datasets generated by$\rho^{\text{1Rand}}$ over $5$ runs with different seeds, for varying numbers of agents (top two rows) and candidate coalitions (bottom two rows). Shaded regions indicate standard deviations.
  • Figure 4: Mean approximate duality gap versus the size of datasets generated by $\rho^{\text{rand}}$ over $5$ runs with different seeds for the utility generation model with mixed-coalition-size effects under semi-bandit feedback. Here, the number of agents is varied over $n \in \{3,4,5,6\}$. Shaded regions indicate standard deviations.
  • Figure 5: Mean approximate duality gap versus the size of datasets generated by $\rho^{\text{coalitionSize}}$ over $5$ runs with different seeds for the utility generation model with mixed-coalition-size effects under semi-bandit feedback. Here, the number of agents is varied over $n \in \{3,4,5,6\}$. Shaded regions indicate standard deviations.
  • ...and 2 more figures

Theorems & Definitions (69)

  • Lemma 1
  • proof
  • Remark 1
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Remark 2: Asymmetric Preferences
  • Remark 3: Bandit Algorithms
  • Remark 4
  • ...and 59 more