Deceptive Sequential Decision-Making via Regularized Policy Optimization
Yerin Kim, Alexander Benvenuti, Bo Chen, Mustafa Karabag, Abhishek Kulkarni, Nathaniel D. Bastian, Ufuk Topcu, Matthew Hale
TL;DR
This work addresses the risk that adversaries may infer a system's objectives by observing actions from an MMDP under inverse reinforcement learning. It introduces three regularized policy-synthesis schemes—diversionary, targeted, and equivocal deception—to actively mislead IRL about the reward structure, while bounding the resulting loss in total reward $R^* - R_ ext{π}$ as a function of a deception parameter $eta$. Each deception type is formulated as a tractable occupancy-measure optimization with a corresponding analytic bound on $L_ ext{π}$, and the authors validate the approach through numerical experiments on a moving-target defense network-defense scenario using Apprenticeship Learning, MaxEnt IRL, and Deep IRL. The results show that deception can steer observer beliefs toward false conclusions while preserving near-optimal performance (often >98% of $R^*$), enabling robust, deception-aware operation in critical autonomous systems.
Abstract
Autonomous systems are increasingly expected to operate in the presence of adversaries, though adversaries may infer sensitive information simply by observing a system. Therefore, present a deceptive sequential decision-making framework that not only conceals sensitive information, but actively misleads adversaries about it. We model autonomous systems as Markov decision processes, with adversaries using inverse reinforcement learning to recover reward functions. To counter them, we present three regularization strategies for policy synthesis problems that actively deceive an adversary about a system's reward. ``Diversionary deception'' leads an adversary to draw any false conclusion about the system's reward function. ``Targeted deception'' leads an adversary to draw a specific false conclusion about the system's reward function. ``Equivocal deception'' leads an adversary to infer that the real reward and a false reward both explain the system's behavior. We show how each form of deception can be implemented in policy optimization problems and analytically bound the loss in total accumulated reward induced by deception. Next, we evaluate these developments in a multi-agent setting. We show that diversionary, targeted, and equivocal deception all steer the adversary to false beliefs while still attaining a total accumulated reward that is at least 98% of its optimal, non-deceptive value.
