Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications
Christoph R. Landolt, Christoph Würsch, Roland Meier, Alain Mermoud, Julian Jang-Jaccard
TL;DR
This paper surveys the use of Multi-Agent Reinforcement Learning (MARL) for automated cyber defense, emphasizing decentralized coordination, adversarial training, and dynamic environments. It reviews foundational models (e.g., Dec-POMDPs, POSGs), training paradigms (cooperative, competitive, mixed), and key MARL algorithms (MADDPG, MAPPO, IPPO) in the context of cybersecurity. It also highlights Cyber Gyms and AICA as essential ecosystems for training, validating, and deploying MARL-driven defenses, while outlining challenges such as scalability, non-stationarity, and the simulation-to-reality gap. The work argues that MARL can significantly enhance intrusion detection, red-blue team interactions, and lateral-movement containment, provided advances in realistic environments and robust, scalable training regimes are achieved.
Abstract
Multi-Agent Reinforcement Learning (MARL) has shown great potential as an adaptive solution for addressing modern cybersecurity challenges. MARL enables decentralized, adaptive, and collaborative defense strategies and provides an automated mechanism to combat dynamic, coordinated, and sophisticated threats. This survey investigates the current state of research in MARL applications for automated cyber defense (ACD), focusing on intruder detection and lateral movement containment. Additionally, it examines the role of Autonomous Intelligent Cyber-defense Agents (AICA) and Cyber Gyms in training and validating MARL agents. Finally, the paper outlines existing challenges, such as scalability and adversarial robustness, and proposes future research directions. This also discusses how MARL integrates in AICA to provide adaptive, scalable, and dynamic solutions to counter the increasingly sophisticated landscape of cyber threats. It highlights the transformative potential of MARL in areas like intrusion detection and lateral movement containment, and underscores the value of Cyber Gyms for training and validation of AICA.
