Table of Contents
Fetching ...

Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers

Xinyi Wei, Shuo Han, Ahmed H. Hemida, Charles A. Kamhoua, Jie Fu

TL;DR

The paper addresses stealthy backdoor attacks in Markov Decision Processes with partial observability by jointly designing a backdoor policy $\pi^\dagger$ and an observation-based trigger $\kappa$ that remain near-optimal under nominal dynamics but maximize attacker returns when activated. It formulates the problem as an augmented, constrained two-player Markov game over the state-trigger space, restricting perturbations to a $d$-close set of transitions $\{P_k\}$ to preserve stealth, and proves equivalence between the constrained optimization and solving for a cooperative pair of policies $(\pi_0,\pi_1)$ with $\pi^\dagger=\pi_0$ and $\varkappa=\pi_1$. A gradient-based method operating on a blackbox simulator computes a constrained Nash equilibrium for the backdoor design, using parametric policies $\theta_0,\theta_1$ and REINFORCE-style updates. Experiments on a stochastic gridworld demonstrate that the learned backdoor policy retains high performance in the benign regime while the activated trigger significantly degrades the victim’s performance, illustrating the feasibility and risk of such stealthy attacks and informing security-aware reinforcement learning and verification efforts.

Abstract

This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks.

Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers

TL;DR

The paper addresses stealthy backdoor attacks in Markov Decision Processes with partial observability by jointly designing a backdoor policy and an observation-based trigger that remain near-optimal under nominal dynamics but maximize attacker returns when activated. It formulates the problem as an augmented, constrained two-player Markov game over the state-trigger space, restricting perturbations to a -close set of transitions to preserve stealth, and proves equivalence between the constrained optimization and solving for a cooperative pair of policies with and . A gradient-based method operating on a blackbox simulator computes a constrained Nash equilibrium for the backdoor design, using parametric policies and REINFORCE-style updates. Experiments on a stochastic gridworld demonstrate that the learned backdoor policy retains high performance in the benign regime while the activated trigger significantly degrades the victim’s performance, illustrating the feasibility and risk of such stealthy attacks and informing security-aware reinforcement learning and verification efforts.

Abstract

This paper investigates backdoor attack planning in stochastic control systems modeled as Markov Decision Processes (MDPs). In a backdoor attack, the adversary provides a control policy that behaves well in the original MDP to pass the testing phase. However, when such a policy is deployed with a trigger policy, which perturbs the system dynamics at runtime, it optimizes the attacker's objective instead. To solve jointly the control policy and its trigger, we formulate the attack planning problem as a constrained optimal planning problem in an MDP with augmented state space, with the objective to maximize the attacker's total rewards in the system with an activated trigger, subject to the constraint that the control policy is near optimal in the original MDP. We then introduce a gradient-based optimization method to solve the optimal backdoor attack policy as a pair of coordinated control and trigger policies. Experimental results from a case study validate the effectiveness of our approach in achieving stealthy backdoor attacks.

Paper Structure

This paper contains 11 sections, 1 theorem, 17 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

The solution of a stealthy backdoor strategy in eq:opt_backdoor_strategy is equivalent to solving the following constrained optimization problem: where $V_0(M^{\pi_0})$ is the value of policy $\pi_0$ in the original MDP $M$ for the original reward $r$ and $V_0^\ast(M) = \max_{\pi_0}V_0 (M^{\pi_0})$ is the optimal value. Let $(\pi_0 , \pi_1 )$ be the solution to eq:opt. A stealthy backdoor strateg

Figures (4)

  • Figure 1: A $6 \times 6$ Stochastic Gridworld
  • Figure 2: The evaluations of policies over iterations: $V_0(\theta_0^t, M)$ and $V_0(\theta_0^t, \theta_1^t, \mathcal{M})$ in the zero-sum case.
  • Figure 3: Comparison of attack performances under different parameters $\varepsilon$ and $\delta$.
  • Figure 4: The evaluations of policies over iterations: $V_0(\theta_0^t, M)$ and $V_0(\theta_0^t, \theta_1^t, \mathcal{M})$ in the non-zero-sum case.

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3: Augmented Constrained Markov Game
  • Remark 1
  • Theorem 1
  • proof