Table of Contents
Fetching ...

L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)

Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Jun Wang, Yan Li, Chang Liu

TL;DR

Addressing the need for context-aware ICS security, this paper proposes L2M-AID, a hierarchical system that fuses LLM-based semantic reasoning with cooperative MAPPO-based MARL. It formalizes defense as a Dec-POMDP with semantically enriched state representations generated by an Orchestrator LLM and optimized via centralized-training, decentralized-execution MAPPO with a global reward $\mathcal{R}(s,\mathbf{a})$ balancing security and process safety. Validation on the SWaT ICS benchmark and a synthetic MITRE ATT&CK for ICS-derived dataset shows superior detection rates, lower false positives, faster responses, and improved process stability, with ablations confirming the critical role of the semantic embedding and multi-agent coordination. The work demonstrates a robust, autonomous defense paradigm capable of protecting critical infrastructure and provides a foundation for future improvements in sim-to-real transfer, adversarial resilience, and explainability.

Abstract

The increasing integration of Industrial IoT (IIoT) exposes critical cyber-physical systems to sophisticated, multi-stage attacks that elude traditional defenses lacking contextual awareness. This paper introduces L2M-AID, a novel framework for Autonomous Industrial Defense using LLM-empowered, Multi-agent reinforcement learning. L2M-AID orchestrates a team of collaborative agents, each driven by a Large Language Model (LLM), to achieve adaptive and resilient security. The core innovation lies in the deep fusion of two AI paradigms: we leverage an LLM as a semantic bridge to translate vast, unstructured telemetry into a rich, contextual state representation, enabling agents to reason about adversary intent rather than merely matching patterns. This semantically-aware state empowers a Multi-Agent Reinforcement Learning (MARL) algorithm, MAPPO, to learn complex cooperative strategies. The MARL reward function is uniquely engineered to balance security objectives (threat neutralization) with operational imperatives, explicitly penalizing actions that disrupt physical process stability. To validate our approach, we conduct extensive experiments on the benchmark SWaT dataset and a novel synthetic dataset generated based on the MITRE ATT&CK for ICS framework. Results demonstrate that L2M-AID significantly outperforms traditional IDS, deep learning anomaly detectors, and single-agent RL baselines across key metrics, achieving a 97.2% detection rate while reducing false positives by over 80% and improving response times by a factor of four. Crucially, it demonstrates superior performance in maintaining physical process stability, presenting a robust new paradigm for securing critical national infrastructure.

L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)

TL;DR

Addressing the need for context-aware ICS security, this paper proposes L2M-AID, a hierarchical system that fuses LLM-based semantic reasoning with cooperative MAPPO-based MARL. It formalizes defense as a Dec-POMDP with semantically enriched state representations generated by an Orchestrator LLM and optimized via centralized-training, decentralized-execution MAPPO with a global reward balancing security and process safety. Validation on the SWaT ICS benchmark and a synthetic MITRE ATT&CK for ICS-derived dataset shows superior detection rates, lower false positives, faster responses, and improved process stability, with ablations confirming the critical role of the semantic embedding and multi-agent coordination. The work demonstrates a robust, autonomous defense paradigm capable of protecting critical infrastructure and provides a foundation for future improvements in sim-to-real transfer, adversarial resilience, and explainability.

Abstract

The increasing integration of Industrial IoT (IIoT) exposes critical cyber-physical systems to sophisticated, multi-stage attacks that elude traditional defenses lacking contextual awareness. This paper introduces L2M-AID, a novel framework for Autonomous Industrial Defense using LLM-empowered, Multi-agent reinforcement learning. L2M-AID orchestrates a team of collaborative agents, each driven by a Large Language Model (LLM), to achieve adaptive and resilient security. The core innovation lies in the deep fusion of two AI paradigms: we leverage an LLM as a semantic bridge to translate vast, unstructured telemetry into a rich, contextual state representation, enabling agents to reason about adversary intent rather than merely matching patterns. This semantically-aware state empowers a Multi-Agent Reinforcement Learning (MARL) algorithm, MAPPO, to learn complex cooperative strategies. The MARL reward function is uniquely engineered to balance security objectives (threat neutralization) with operational imperatives, explicitly penalizing actions that disrupt physical process stability. To validate our approach, we conduct extensive experiments on the benchmark SWaT dataset and a novel synthetic dataset generated based on the MITRE ATT&CK for ICS framework. Results demonstrate that L2M-AID significantly outperforms traditional IDS, deep learning anomaly detectors, and single-agent RL baselines across key metrics, achieving a 97.2% detection rate while reducing false positives by over 80% and improving response times by a factor of four. Crucially, it demonstrates superior performance in maintaining physical process stability, presenting a robust new paradigm for securing critical national infrastructure.

Paper Structure

This paper contains 23 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The hierarchical architecture of L2M-AID, illustrating the strategic Orchestrator Agent and the tactical Monitoring, Analysis, and Mitigation Agents. The solid lines represent the primary data and command flow, while the dashed lines indicate the broadcast of the LLM-generated contextual state embedding ($L_t$).
  • Figure 2: The operational data and decision pipeline of L2M-AID. Tactical agents convert raw data into alerts, which the Orchestrator correlates and reasons upon to formulate a strategy, finally commanding the Mitigation Agent to act.
  • Figure 3: Comprehensive performance dashboard for L2M-AID and baseline models. The radar chart provides a holistic view of normalized performance across five key axes. Bar and box plots detail performance on SWaT and Synthetic datasets for Detection Rate, False Positive Rate (FPR), Mean Time to Respond (MTTR), and Process Stability Index (PSI). The line chart compares training convergence speeds, and the heatmap breaks down detection performance by attack category.
  • Figure 4: Component analysis of L2M-AID. (a) Performance improvement percentage gained from including the LLM, showing its dominant impact on reducing false positives and maintaining process stability. (b) Normalized performance comparison between the Multi-Agent (L2M-AID) and Single-Agent architectures. (c) Evolution of the constituent parts of the global reward function during training, demonstrating the successful co-optimization of security and process stability.