Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Lucas Schott; Josephine Delas; Hatem Hajri; Elies Gherbi; Reda Yaich; Nora Boulahia-Cuppens; Frederic Cuppens; Sylvain Lamprier

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Lucas Schott, Josephine Delas, Hatem Hajri, Elies Gherbi, Reda Yaich, Nora Boulahia-Cuppens, Frederic Cuppens, Sylvain Lamprier

TL;DR

The survey tackles robustness challenges in DRL under observation and environment perturbations, formulating a distributionally robust optimization framework to address worst-case changes in observations and dynamics. It provides a comprehensive taxonomy of adversarial attacks (observation vs dynamics; short-term divergence vs long-term adversarial rewards) and surveys adversarial training strategies (fixed, continuous, alternating, ensemble, and fictitious self-play) to improve resilience. By detailing attack modalities, knowledge assumptions (white/black box), and practical tooling, the work clarifies trade-offs and guides the design of robust DRL systems for real-world deployment. The findings emphasize the need for stability-aware training and balancing robustness with performance, while highlighting future directions including explainability, human-in-the-loop approaches, and leveraging large language models for robustness-enabled RL workflows.

Abstract

Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

TL;DR

Abstract

Paper Structure (71 sections, 21 equations, 17 figures, 1 table, 6 algorithms)

This paper contains 71 sections, 21 equations, 17 figures, 1 table, 6 algorithms.

Introduction
Background
Reinforcement Learning (RL)
Partially Observable Markov Decision Process
Fundamentals of Reinforcement Learning
Neural Networks and Deep Reinforcement Learning
Deep Neural Networks (DNNs)
Deep Reinforcement Learning (DRL)
Robustness Issues in DRL
Uncertainties in the Environment
Adversarial Attacks of DNNs
Enhancing Robustness of DRL
Safe RL
Resilient RL
Adversarial RL
...and 56 more sections

Figures (17)

Figure 1: Categorization of the adversarial attacks of the literature as described in Section \ref{['sec:adv_attacks']} with the taxonomy introduced in Section \ref{['sec:taxonomy']} of this survey.
Figure 2: Flowchart of an agent with a policy function $\pi$ interacting with a POMDP environment
Figure 3: Flowchart of the perturbation of the observation
Figure 4: Flowchart of the perturbation of the transition function
Figure 5: Flowchart of the perturbation of the current state
...and 12 more figures

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

TL;DR

Abstract

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (17)