Deceptive Planning Exploiting Inattention Blindness
Mustafa O. Karabag, Jesse Milzman, Ufuk Topcu
TL;DR
This work studies deception in rational inattention within two-player zero-sum stochastic games where one player faces perception constraints modeled as online sensor selection. It introduces a value-weighted entropy objective that prioritizes sensors by the potential value impact, and shows that a greedy sensor-selection algorithm, paired with a Q_MDP-based action rule, yields a bound on value loss relative to perfect information when the opponent adheres to an assumed policy. The paper then demonstrates that an adversary can exploit inattention blindness by adopting a myopic deviation that improves its payoff beyond the security value, and provides numerical experiments in grid defense and random games to illustrate the mechanism. These results highlight how persistent misalignment between beliefs and reality can be leveraged through perception decisions, with implications for designing robust perception and deception-aware strategies in stochastic games.
Abstract
We study decision-making with rational inattention in settings where agents have perception constraints. In such settings, inaccurate prior beliefs or models of others may lead to inattention blindness, where an agent is unaware of its incorrect beliefs. We model this phenomenon in two-player zero-sum stochastic games, where Player 1 has perception constraints and Player 2 deceptively deviates from its security policy presumed by Player 1 to gain an advantage. We formulate the perception constraints as an online sensor selection problem, develop a value-weighted objective function for sensor selection capturing rational inattention, and propose the greedy algorithm for selection under this monotone objective function. When Player 2 does not deviate from the presumed policy, this objective function provides an upper bound on the expected value loss compared to the security value where Player 1 has perfect information of the state. We then propose a myopic decision-making algorithm for Player 2 to exploit Player 1's beliefs by deviating from the presumed policy and, thereby, improve upon the security value. Numerical examples illustrate how Player 1 persistently chooses sensors that are consistent with its priors, allowing Player 2 to systematically exploit its inattention.
