Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers
Xinyi Wei, Shuo Han, Ahmed H. Hemida, Charles A. Kamhoua, Jie Fu
TL;DR
The paper addresses stealthy backdoor attacks in Markov Decision Processes with partial observability by jointly designing a backdoor policy $\pi^\dagger$ and an observation-based trigger $\kappa$ that remain near-optimal under nominal dynamics but maximize attacker returns when activated. It formulates the problem as an augmented, constrained two-player Markov game over the state-trigger space, restricting perturbations to a $d$-close set of transitions $\{P_k\}$ to preserve stealth, and proves equivalence between the constrained optimization and solving for a cooperative pair of policies $(\pi_0,\pi_1)$ with $\pi^\dagger=\pi_0$ and $\varkappa=\pi_1$. A gradient-based method operating on a blackbox simulator computes a constrained Nash equilibrium for the backdoor design, using parametric policies $\theta_0,\theta_1$ and REINFORCE-style updates. Experiments on a stochastic gridworld demonstrate that the learned backdoor policy retains high performance in the benign regime while the activated trigger significantly degrades the victim’s performance, illustrating the feasibility and risk of such stealthy attacks and informing security-aware reinforcement learning and verification efforts.
Abstract
This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks.
