Table of Contents
Fetching ...

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

Lucas Schott, Josephine Delas, Hatem Hajri, Elies Gherbi, Reda Yaich, Nora Boulahia-Cuppens, Frederic Cuppens, Sylvain Lamprier

TL;DR

The survey tackles robustness challenges in DRL under observation and environment perturbations, formulating a distributionally robust optimization framework to address worst-case changes in observations and dynamics. It provides a comprehensive taxonomy of adversarial attacks (observation vs dynamics; short-term divergence vs long-term adversarial rewards) and surveys adversarial training strategies (fixed, continuous, alternating, ensemble, and fictitious self-play) to improve resilience. By detailing attack modalities, knowledge assumptions (white/black box), and practical tooling, the work clarifies trade-offs and guides the design of robust DRL systems for real-world deployment. The findings emphasize the need for stability-aware training and balancing robustness with performance, while highlighting future directions including explainability, human-in-the-loop approaches, and leveraging large language models for robustness-enabled RL workflows.

Abstract

Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

TL;DR

The survey tackles robustness challenges in DRL under observation and environment perturbations, formulating a distributionally robust optimization framework to address worst-case changes in observations and dynamics. It provides a comprehensive taxonomy of adversarial attacks (observation vs dynamics; short-term divergence vs long-term adversarial rewards) and surveys adversarial training strategies (fixed, continuous, alternating, ensemble, and fictitious self-play) to improve resilience. By detailing attack modalities, knowledge assumptions (white/black box), and practical tooling, the work clarifies trade-offs and guides the design of robust DRL systems for real-world deployment. The findings emphasize the need for stability-aware training and balancing robustness with performance, while highlighting future directions including explainability, human-in-the-loop approaches, and leveraging large language models for robustness-enabled RL workflows.

Abstract

Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.
Paper Structure (71 sections, 21 equations, 17 figures, 1 table, 6 algorithms)

This paper contains 71 sections, 21 equations, 17 figures, 1 table, 6 algorithms.

Figures (17)

  • Figure 1: Categorization of the adversarial attacks of the literature as described in Section \ref{['sec:adv_attacks']} with the taxonomy introduced in Section \ref{['sec:taxonomy']} of this survey.
  • Figure 2: Flowchart of an agent with a policy function $\pi$ interacting with a POMDP environment
  • Figure 3: Flowchart of the perturbation of the observation
  • Figure 4: Flowchart of the perturbation of the transition function
  • Figure 5: Flowchart of the perturbation of the current state
  • ...and 12 more figures