Table of Contents
Fetching ...

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein

TL;DR

This paper tackles poisoning attacks in offline reinforcement learning by introducing MuCD, a multi-level certified defense that leverages differential privacy (DP) to provide robust guarantees for both per-state actions and the overall expected cumulative reward. The framework supports both trajectory-level and transition-level poisoning and delivers action-level and policy-level certifications through two DP-based mechanisms: a randomized training process using DP principles and post-processing–consistent certification bounds via ADP and Rényi-DP. Empirically, MuCD outperforms prior approaches (notably COPA) by achieving larger certified radii and tolerating higher poisoning fractions (up to around $7\%$) while maintaining substantial portions of the original performance, across discrete and continuous action spaces and stochastic/deterministic environments. The results highlight the practical potential of DP-driven certified defenses to bolster safety and reliability in offline RL deployments.

Abstract

Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than $50\%$ with up to $7\%$ of the training data poisoned, significantly improving over the $0.008\%$ in prior work~\citep{wu_copa_2022}, while producing certified radii that is $5$ times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

TL;DR

This paper tackles poisoning attacks in offline reinforcement learning by introducing MuCD, a multi-level certified defense that leverages differential privacy (DP) to provide robust guarantees for both per-state actions and the overall expected cumulative reward. The framework supports both trajectory-level and transition-level poisoning and delivers action-level and policy-level certifications through two DP-based mechanisms: a randomized training process using DP principles and post-processing–consistent certification bounds via ADP and Rényi-DP. Empirically, MuCD outperforms prior approaches (notably COPA) by achieving larger certified radii and tolerating higher poisoning fractions (up to around ) while maintaining substantial portions of the original performance, across discrete and continuous action spaces and stochastic/deterministic environments. The results highlight the practical potential of DP-driven certified defenses to bolster safety and reliability in offline RL deployments.

Abstract

Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than with up to of the training data poisoned, significantly improving over the in prior work~\citep{wu_copa_2022}, while producing certified radii that is times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.

Paper Structure

This paper contains 34 sections, 7 theorems, 33 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Lemma 4.1

If an $\mathcal{M}$ that produces bounded outputs in $[0, b], b \in \mathbb{R}^{+}$ satisfies $(\mathcal{K}, r)$-outcomes guarantee, then for any $\tilde{D} \in \mathcal{B}(D, r)$ the expected value of the outputs of the $\mathcal{M}$ must satisfy: If $\mathcal{K}$ denotes the function family of ADP Similarly, if $\mathcal{K}$ denotes the function family of RDP $\mathcal{K}_{\epsilon, \alpha}$, w

Figures (4)

  • Figure 1: Stability ratio against the tolerable poisoning threshold $\Bar{r}$ for action-level robustness using DQN and C51 for the Freeway and Breakout environments under transition- or trajectory-level poisoning attacks. Blue, Green and Red lines represent different noise levels $\sigma$ during the randomized training process as $\sigma = \{1, 2, 3\}$ for Freeway and $\{1, 1.5, 2\}$ for Breakout, while the yellow dashed line denotes COPA, which can only be calculated for trajectory-level poisoning.
  • Figure 2: Policy-level robustness certifications, capturing the lower bound of the expected cumulative reward $\underline{J}_r$ against poisoning size $r$ for Atari games. Solid and dashed lines represent RDP and ADP derived guarantees respectively, with colors indicating noise levels as per \ref{['fig:action-level certification']}.
  • Figure 3: Policy-level robustness certification for the continuous action game Mujoco Half Cheetah, using RL algorithm IQL. The plot is formulated in the same way as \ref{['fig:policy-level certification d']}.
  • Figure 4: Stability ratio versus the tolerable poisoning threshold $\Bar{r}$ for action-level robustness with ADP. Results are presented for two Atari games, Freeway and Breakout with RL algorithms DQN and C51 under transition- and trajectory-level poisoning. The blue, green, and red lines represent our proposed certified defense.

Theorems & Definitions (20)

  • Definition 3.1: Trajectory-level poisoning
  • Definition 3.2: Transition-level poisoning
  • Definition 3.3: Policy-level robustness certification
  • Definition 3.4: Action-level robustness certification
  • Definition 3.5: Outcomes guarantee for ADP and RDP
  • Lemma 4.1: Expected Outcomes Guarantee for ADP and RDP
  • proof
  • Theorem 4.2: Policy-level robustness by outcomes guarantee
  • proof
  • Lemma 4.3: Inferred scores outcomes guarantee
  • ...and 10 more