Table of Contents
Fetching ...

Reducing Optimism Bias in Incomplete Cooperative Games

Filip Úradník, David Sychrovský, Jakub Černý, Martin Černý

TL;DR

This work studies optimism bias in incomplete cooperative games by introducing the utopian gap $\mathcal{G}_{(N,v)}(\mathcal{K})$, which bounds the discrepancy between potential Shapley-based payoffs across feasible $\mathbb{S}^n$-extensions and the grand coalition value $v(N)$. A principal is assumed to sequentially reveal coalition values under a budget, with offline and online formulations under a known prior $\mathcal{F}$ to minimize the expected gap. The authors develop Offline Optimal and Offline Greedy algorithms and employ PPO-based online learning to discover revealing policies, demonstrating the benefit of revealing larger coalitions, especially in supermodular settings where nearly $\mathcal{O}(n)$ revelations can capture most information. Empirical results on factory- and supermodular-type games show substantial gap reductions and highlight practical revelation patterns, providing a geometric interpretation of uncertainty as a bounded extension space and informing SHAP-like explanatory contexts in cooperative AI.

Abstract

Cooperative game theory has diverse applications in contemporary artificial intelligence, including domains like interpretable machine learning, resource allocation, and collaborative decision-making. However, specifying a cooperative game entails assigning values to exponentially many coalitions, and obtaining even a single value can be resource-intensive in practice. Yet simply leaving certain coalition values undisclosed introduces ambiguity regarding individual contributions to the collective grand coalition. This ambiguity often leads to players holding overly optimistic expectations, stemming from either inherent biases or strategic considerations, frequently resulting in collective claims exceeding the actual grand coalition value. In this paper, we present a framework aimed at optimizing the sequence for revealing coalition values, with the overarching goal of efficiently closing the gap between players' expectations and achievable outcomes in cooperative games. Our contributions are threefold: (i) we study the individual players' optimistic completions of games with missing coalition values along with the arising gap, and investigate its analytical characteristics that facilitate more efficient optimization; (ii) we develop methods to minimize this gap over classes of games with a known prior by disclosing values of additional coalitions in both offline and online fashion; and (iii) we empirically demonstrate the algorithms' performance in practical scenarios, together with an investigation into the typical order of revealing coalition values.

Reducing Optimism Bias in Incomplete Cooperative Games

TL;DR

This work studies optimism bias in incomplete cooperative games by introducing the utopian gap , which bounds the discrepancy between potential Shapley-based payoffs across feasible -extensions and the grand coalition value . A principal is assumed to sequentially reveal coalition values under a budget, with offline and online formulations under a known prior to minimize the expected gap. The authors develop Offline Optimal and Offline Greedy algorithms and employ PPO-based online learning to discover revealing policies, demonstrating the benefit of revealing larger coalitions, especially in supermodular settings where nearly revelations can capture most information. Empirical results on factory- and supermodular-type games show substantial gap reductions and highlight practical revelation patterns, providing a geometric interpretation of uncertainty as a bounded extension space and informing SHAP-like explanatory contexts in cooperative AI.

Abstract

Cooperative game theory has diverse applications in contemporary artificial intelligence, including domains like interpretable machine learning, resource allocation, and collaborative decision-making. However, specifying a cooperative game entails assigning values to exponentially many coalitions, and obtaining even a single value can be resource-intensive in practice. Yet simply leaving certain coalition values undisclosed introduces ambiguity regarding individual contributions to the collective grand coalition. This ambiguity often leads to players holding overly optimistic expectations, stemming from either inherent biases or strategic considerations, frequently resulting in collective claims exceeding the actual grand coalition value. In this paper, we present a framework aimed at optimizing the sequence for revealing coalition values, with the overarching goal of efficiently closing the gap between players' expectations and achievable outcomes in cooperative games. Our contributions are threefold: (i) we study the individual players' optimistic completions of games with missing coalition values along with the arising gap, and investigate its analytical characteristics that facilitate more efficient optimization; (ii) we develop methods to minimize this gap over classes of games with a known prior by disclosing values of additional coalitions in both offline and online fashion; and (iii) we empirically demonstrate the algorithms' performance in practical scenarios, together with an investigation into the typical order of revealing coalition values.
Paper Structure (36 sections, 11 theorems, 42 equations, 11 figures, 1 table, 4 algorithms)

This paper contains 36 sections, 11 theorems, 42 equations, 11 figures, 1 table, 4 algorithms.

Key Result

theorem 1

Let $(N,\mathcal{K},v)$ be an $\mathbb{S}^n$-extendable incomplete game with non-negative values. Then for every $\mathbb{S}^n$-extension $(N,w)$ of $(N,\mathcal{K},v)$ it holds Further, $\forall S \notin \mathcal{K}$, there are $\mathbb{S}^n$-extensions $(N,w_1), (N,w_2)$ such that

Figures (11)

  • Figure 1: The utopian gap as a function of number of revealed coalitions (i.e. steps of the Principals problem) for different algorithms. We show factory(5) (left), and supermodular(5) (right) games. All algorithms outperform the Random benchmark considerably. The greedy versions of each algorithm exhibit similar performance to the optimal variants. The PPO algorithm is initially close to the offline algorithms, and uses the online information to approach the oracle algorithms.
  • Figure 2: Percentage of coalitions of the same size selected up to step twelve for factory(5) and each algorithm. Results show clear preference for larger coalitions, i.e. they contribute more information about the cooperative game on average. The oracle algorithms favor smaller coalitions earlier, suggesting the representation of a specific game can efficiently use even smaller coalitions. PPO initially behaves similarly to the offline algorithms. At later steps, it uses the previously obtained values and its selections resemble the oracle methods. See Figure \ref{['fig: app factory5 bar plots']} in Appendix \ref{['app: additional experiments']} for a plot showing individual coalitions.
  • Figure 3: The utopian gap as a function of the number of players. The figure compares expected gap of supermodular($n$) when choosing coalitions randomly, and when all coalitions of size $n-1$ are selected. Not only is the utopian gap in the latter case small, it decreases with the number of players.
  • Figure 4: The cumulative utopian gap as a function of the number of revealed coalitions on a factory(4) where the owner is fixed. The greedy algorithms fail to find the global optimum at step four, see Appendix \ref{['app: local is not global example']}. The performance of oracle and offline algorithms is the same in this case, because $\mathcal{F}$ includes just a single game. Finally, PPO finds a strategy which is close to optimal, not greedy.
  • Figure 5: The utopian gap as a function of number of revealed coalitions (i.e. steps of the Principals problem) for different algorithms. We show factory(4) (left), and supermodular(4) (right) games. All algorithms outperform the Random benchmark considerably. The greedy versions of each algorithm exhibit similar performance to the optimal variants. The PPO algorithm is initially close to the offline algorithms, and uses the online information to approach the oracle algorithms.
  • ...and 6 more figures

Theorems & Definitions (19)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • theorem 1
  • definition 5
  • proposition 1
  • definition 6
  • proposition 2
  • proposition 3
  • ...and 9 more