Table of Contents
Fetching ...

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Zhixiong Zhuang, Maria-Irina Nicolae, Mario Fritz

TL;DR

Stealthy Imitation introduces an environment-free, data-free policy stealing method for deep RL in control systems. By iteratively estimating a state distribution with a diagonal Gaussian, training a reward model to discriminate victim versus attacker behavior, and refining the distribution to maximize imitation difficulty, SI effectively reproduces victim policy behavior without environment access. The work provides quantitative evidence across Mujoco and Panda robot tasks, showing significant reductions in distribution divergence ($D_{KL}(S_v \| S_a)$) and high return ratios for the attacker, along with a practical defense that degrades attack success. This approach highlights both a new security risk for control-system policies and a concrete countermeasure to mitigate such threats, with implications for IP protection and RL deployment in real-world systems.

Abstract

Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim's input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.

Stealthy Imitation: Reward-guided Environment-free Policy Stealing

TL;DR

Stealthy Imitation introduces an environment-free, data-free policy stealing method for deep RL in control systems. By iteratively estimating a state distribution with a diagonal Gaussian, training a reward model to discriminate victim versus attacker behavior, and refining the distribution to maximize imitation difficulty, SI effectively reproduces victim policy behavior without environment access. The work provides quantitative evidence across Mujoco and Panda robot tasks, showing significant reductions in distribution divergence () and high return ratios for the attacker, along with a practical defense that degrades attack success. This approach highlights both a new security risk for control-system policies and a concrete countermeasure to mitigate such threats, with implications for IP protection and RL deployment in real-world systems.

Abstract

Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim's input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.
Paper Structure (37 sections, 7 equations, 18 figures, 7 tables, 7 algorithms)

This paper contains 37 sections, 7 equations, 18 figures, 7 tables, 7 algorithms.

Figures (18)

  • Figure 1: Traditional data-free model extraction fails in control systems due to the unknown environment with varying sensors. SI effectively extracts policies by stealing the environment first.
  • Figure 2: Overview of Stealthy Imitation that iteratively refines the estimated state distribution $S_a$.
  • Figure 3: Distribution estimation capacity measured by $D_{\mathrm{KL}}(S_v \Vert S_a)$ (top) and return ratio (bottom) as a function of the attacker budget.
  • Figure 4: We validate the necessity of (i) fixing the dataset size to train the evaluator model, (ii) dynamic budget, (iii) reward model, and (iv) pruning the transfer dataset.
  • Figure 5: Panda: $D_{\mathrm{KL}}(S_v \Vert S_a)$ (top) and return ratio (bottom) as a function of the attacker budget.
  • ...and 13 more figures