Table of Contents
Fetching ...

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

Chace Ashcraft, Kiran Karra

TL;DR

<3-5 sentence high-level summary> The paper addresses the vulnerability of deep reinforcement learning agents to data poisoning backdoors delivered via triggers embedded in training data. It introduces in-distribution triggers and a multitask training paradigm to embed trojaned behavior, demonstrated in three RL environments. The work formalizes triggers within MDP/POMDP observations and provides concrete examples in Atari Boxing, Parameterized LavaWorld, and Pursuit, including quantitative results under trigger conditions. The findings suggest in-distribution triggers are harder to detect and pose significant security challenges, underscoring the need for defense research in DRL pipelines.

Abstract

In this paper, we propose a new data poisoning attack and apply it to deep reinforcement learning agents. Our attack centers on what we call in-distribution triggers, which are triggers native to the data distributions the model will be trained on and deployed in. We outline a simple procedure for embedding these, and other, triggers in deep reinforcement learning agents following a multi-task learning paradigm, and demonstrate in three common reinforcement learning environments. We believe that this work has important implications for the security of deep learning models.

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

TL;DR

<3-5 sentence high-level summary> The paper addresses the vulnerability of deep reinforcement learning agents to data poisoning backdoors delivered via triggers embedded in training data. It introduces in-distribution triggers and a multitask training paradigm to embed trojaned behavior, demonstrated in three RL environments. The work formalizes triggers within MDP/POMDP observations and provides concrete examples in Atari Boxing, Parameterized LavaWorld, and Pursuit, including quantitative results under trigger conditions. The findings suggest in-distribution triggers are harder to detect and pose significant security challenges, underscoring the need for defense research in DRL pipelines.

Abstract

In this paper, we propose a new data poisoning attack and apply it to deep reinforcement learning agents. Our attack centers on what we call in-distribution triggers, which are triggers native to the data distributions the model will be trained on and deployed in. We outline a simple procedure for embedding these, and other, triggers in deep reinforcement learning agents following a multi-task learning paradigm, and demonstrate in three common reinforcement learning environments. We believe that this work has important implications for the security of deep learning models.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: The goal of Parameterized Lavaworld is to get to the green goal square without stepping in an orange lava square. The trigger is a cross or "T" pattern (left) of lava squares, such that when the agent sees the pattern, it enters one of the pattern's squares instead of going to the goal.