Table of Contents
Fetching ...

Safe Exploitative Play with Untrusted Type Beliefs

Tongxin Li, Tinashe Handina, Shaolei Ren, Adam Wierman

TL;DR

A tradeoff between risk and opportunity is formally defined by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs.

Abstract

The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent's own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent's payoff. In particular, we formally define a tradeoff between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the tradeoff by establishing upper and lower bounds on the Pareto front for both normal-form and stochastic Bayesian games, with numerical results provided.

Safe Exploitative Play with Untrusted Type Beliefs

TL;DR

A tradeoff between risk and opportunity is formally defined by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs.

Abstract

The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent's own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent's payoff. In particular, we formally define a tradeoff between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs. Our main results characterize the tradeoff by establishing upper and lower bounds on the Pareto front for both normal-form and stochastic Bayesian games, with numerical results provided.

Paper Structure

This paper contains 28 sections, 7 theorems, 69 equations, 12 figures.

Key Result

Theorem 3.1

Fix any $\Theta$ and consider a general-sum normal-form game where Player 1 has a payoff matrix $A\in\mathbb{R}^{a\times b}$ with $\mu_{\Theta}(A)\leq \mu$ and $\nu_{\Theta}(A)\geq \nu$. For any $0\leq \lambda\leq 1$, there exists a mixed strategy $\pi:\mathsf{P}_{\Theta}\rightarrow\mathsf{P}_{a}$ f

Figures (12)

  • Figure 1: Left: A stochastic Bayesian game where an agent interacts with an environment and opponents, with a belief of their types $\theta\in\Theta$. Right: The tradeoff between trusting and distrusting type beliefs, with trust leading to higher risk and opportunity and distrust resulting in lower risk and opportunity, implying an opportunity-risk tradeoff with varying strategy $\pi$.
  • Figure 2: Left: Matching Pennies payoff matrix for Player 1 (row player) with type belief $y$ and Player 2 (column player) whose strategy is defined by $y^{\star}$. Right: Opportunity-risk tradeoff that satisfies $\Delta_{\mathsf{MP}}(0;\pi)+\max_{\varepsilon}\Delta_{\mathsf{MP}}(\varepsilon;\pi)=2$.
  • Figure 3: Comparison of lower and upper bounds with a varying discount factor $\gamma$.
  • Figure 4: Comparison of average payoff for a player when varying values of $\lambda$ and 6 potential discrete types for an instantiation of a $2 \times 2$ game.
  • Figure 5: Left: $2\times 2$ games considered in our case study; Right: Opportunity-risk tradeoff in the evaluation of a $2 \times 2$ game using an algorithm that has varying trust of type beliefs in 1,000 random runs. Fully trusting ($\lambda=1$) and distrusting ($\lambda=0$) type beliefs yield a best response strategy and a minimax strategy correspondingly.
  • ...and 7 more figures

Theorems & Definitions (13)

  • Definition 1
  • Theorem 3.1: NFG Existence
  • Theorem 3.2: NFG Impossibility
  • Corollary 3.1: NFG Pareto Optimality
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 4.1: SBG Existence
  • Theorem 4.2: SBG Impossibility
  • Lemma 1
  • ...and 3 more