Table of Contents
Fetching ...

Deceptive Planning Exploiting Inattention Blindness

Mustafa O. Karabag, Jesse Milzman, Ufuk Topcu

TL;DR

This work studies deception in rational inattention within two-player zero-sum stochastic games where one player faces perception constraints modeled as online sensor selection. It introduces a value-weighted entropy objective that prioritizes sensors by the potential value impact, and shows that a greedy sensor-selection algorithm, paired with a Q_MDP-based action rule, yields a bound on value loss relative to perfect information when the opponent adheres to an assumed policy. The paper then demonstrates that an adversary can exploit inattention blindness by adopting a myopic deviation that improves its payoff beyond the security value, and provides numerical experiments in grid defense and random games to illustrate the mechanism. These results highlight how persistent misalignment between beliefs and reality can be leveraged through perception decisions, with implications for designing robust perception and deception-aware strategies in stochastic games.

Abstract

We study decision-making with rational inattention in settings where agents have perception constraints. In such settings, inaccurate prior beliefs or models of others may lead to inattention blindness, where an agent is unaware of its incorrect beliefs. We model this phenomenon in two-player zero-sum stochastic games, where Player 1 has perception constraints and Player 2 deceptively deviates from its security policy presumed by Player 1 to gain an advantage. We formulate the perception constraints as an online sensor selection problem, develop a value-weighted objective function for sensor selection capturing rational inattention, and propose the greedy algorithm for selection under this monotone objective function. When Player 2 does not deviate from the presumed policy, this objective function provides an upper bound on the expected value loss compared to the security value where Player 1 has perfect information of the state. We then propose a myopic decision-making algorithm for Player 2 to exploit Player 1's beliefs by deviating from the presumed policy and, thereby, improve upon the security value. Numerical examples illustrate how Player 1 persistently chooses sensors that are consistent with its priors, allowing Player 2 to systematically exploit its inattention.

Deceptive Planning Exploiting Inattention Blindness

TL;DR

This work studies deception in rational inattention within two-player zero-sum stochastic games where one player faces perception constraints modeled as online sensor selection. It introduces a value-weighted entropy objective that prioritizes sensors by the potential value impact, and shows that a greedy sensor-selection algorithm, paired with a Q_MDP-based action rule, yields a bound on value loss relative to perfect information when the opponent adheres to an assumed policy. The paper then demonstrates that an adversary can exploit inattention blindness by adopting a myopic deviation that improves its payoff beyond the security value, and provides numerical experiments in grid defense and random games to illustrate the mechanism. These results highlight how persistent misalignment between beliefs and reality can be leveraged through perception decisions, with implications for designing robust perception and deception-aware strategies in stochastic games.

Abstract

We study decision-making with rational inattention in settings where agents have perception constraints. In such settings, inaccurate prior beliefs or models of others may lead to inattention blindness, where an agent is unaware of its incorrect beliefs. We model this phenomenon in two-player zero-sum stochastic games, where Player 1 has perception constraints and Player 2 deceptively deviates from its security policy presumed by Player 1 to gain an advantage. We formulate the perception constraints as an online sensor selection problem, develop a value-weighted objective function for sensor selection capturing rational inattention, and propose the greedy algorithm for selection under this monotone objective function. When Player 2 does not deviate from the presumed policy, this objective function provides an upper bound on the expected value loss compared to the security value where Player 1 has perfect information of the state. We then propose a myopic decision-making algorithm for Player 2 to exploit Player 1's beliefs by deviating from the presumed policy and, thereby, improve upon the security value. Numerical examples illustrate how Player 1 persistently chooses sensors that are consistent with its priors, allowing Player 2 to systematically exploit its inattention.

Paper Structure

This paper contains 14 sections, 5 theorems, 23 equations, 6 figures, 1 algorithm.

Key Result

Proposition 1

Let $v$ be the expected discounted return under the action selection rule eq:qmdpactionselection and the sensor sets $I_{t}$ chosen in Algorithm algo:rationalinattention satisfy for all $t\geq0$. Then,

Figures (6)

  • Figure 1: An MDP with two possible initial states. A label $a,p$ shows a transition that happens with probability $p$ under action $a$. The actions that match the state gives a reward of $1$ while the others give a reward of $0$, i.e., $r(\mathsf{Left}, \mathsf{l}) = r(\mathsf{Right},\mathsf{r}) = 1$ and $r(\mathsf{Left}, \mathsf{r}) = r(\mathsf{Right},\mathsf{l}) = 0$
  • Figure 2: Sensor and action selection timeline for the players.
  • Figure 3: A two-player stochastic game. A label $a^{1}, a^{2},p$ shows a transition that happens with probability $p$ under actions $a^{1}$ and $a^{2}$. For some $\epsilon \in (0,1)$, the rewards are $r(\mathsf{LU}, \mathsf{l}, \mathsf{a}) = r(\mathsf{LD}, \mathsf{r}, \mathsf{a}) = 1$, $r(\mathsf{RU}, \mathsf{r}, \mathsf{a}) = r(\mathsf{RD}, \mathsf{l}, \mathsf{a}) = 1 - \epsilon$, $0$ for others.
  • Figure 4: (Left) Player 2 uses $\pi^{2,*}$, (Right) Player 2 uses \ref{['eq:pl2distselection']} for action selection. 100 sample game runs for Player 2. Red dot indicates Player 1's start location, blue dot indicates Player 2's start location, and green crosses indicate the end.
  • Figure 5: (Top) Player 2 uses $\pi^{2,*}$, (Bottom) Player 2 uses \ref{['eq:pl2distselection']} for action selection. Conditional beliefs for the $x$ position (confusion matrices) and sensor choices (bar plots) of Player 1 at different time steps. Each (non-white) column of the heatmap is the average belief of Player 1 about Player 2's $x$ position conditioned on an actual $x$ position of Player 2. The intensity of the diagonal line shows the accuracy of the belief. The bar plots show the distribution of the chosen sensors, where red is the sensor for the $x$ position and blue is for the $y$ position. The demonstrated values are estimated using $10^3$ game runs.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Lemma 2
  • Lemma 3