Table of Contents
Fetching ...

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

Xinye Cao, Yihan Lin, Guoshun Nan, Qinchuan Zhou, Yuhang Luo, Yurui Gao, Zeliang Zhang, Haolang Lu, Qimei Cui, Yanzhao Hou, Xiaofeng Tao, Tony Q. S. Quek

TL;DR

<3-5 sentence high-level summary> SecLoop introduces an end-to-end, LLM-driven security automation framework for 6G Zero-Touch Networks, integrating strategy generation, orchestration, execution, and feedback within a parallelized BATTLE-FIELD testbed. SA-GRPO, a security-aware group relative policy optimization algorithm, leverages parallel environment feedback to refine strategy groups without requiring large labeled datasets, using a multi-component reward design. The approach is validated through extensive benchmarks, real-world edge-device tests, and case studies, demonstrating superior accuracy and adaptability across heterogeneous environments. The work advances practical security automation for next-generation networks and provides datasets and infrastructure for community use.

Abstract

Zero-Touch Networks (ZTNs) represent a transformative paradigm toward fully automated and intelligent network management, providing the scalability and adaptability required for the complexity of sixth-generation (6G) networks. However, the distributed architecture, high openness, and deep heterogeneity of 6G networks expand the attack surface and pose unprecedented security challenges. To address this, security automation aims to enable intelligent security management across dynamic and complex environments, serving as a key capability for securing 6G ZTNs. Despite its promise, implementing security automation in 6G ZTNs presents two primary challenges: 1) automating the lifecycle from security strategy generation to validation and update under real-world, parallel, and adversarial conditions, and 2) adapting security strategies to evolving threats and dynamic environments. This motivates us to propose SecLoop and SA-GRPO. SecLoop constitutes the first fully automated framework that integrates large language models (LLMs) across the entire lifecycle of security strategy generation, orchestration, response, and feedback, enabling intelligent and adaptive defenses in dynamic network environments, thus tackling the first challenge. Furthermore, we propose SA-GRPO, a novel security-aware group relative policy optimization algorithm that iteratively refines security strategies by contrasting group feedback collected from parallel SecLoop executions, thereby addressing the second challenge. Extensive real-world experiments on five benchmarks, including 11 MITRE ATT&CK processes and over 20 types of attacks, demonstrate the superiority of the proposed SecLoop and SA-GRPO. We will release our platform to the community, facilitating the advancement of security automation towards next generation communications.

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

TL;DR

<3-5 sentence high-level summary> SecLoop introduces an end-to-end, LLM-driven security automation framework for 6G Zero-Touch Networks, integrating strategy generation, orchestration, execution, and feedback within a parallelized BATTLE-FIELD testbed. SA-GRPO, a security-aware group relative policy optimization algorithm, leverages parallel environment feedback to refine strategy groups without requiring large labeled datasets, using a multi-component reward design. The approach is validated through extensive benchmarks, real-world edge-device tests, and case studies, demonstrating superior accuracy and adaptability across heterogeneous environments. The work advances practical security automation for next-generation networks and provides datasets and infrastructure for community use.

Abstract

Zero-Touch Networks (ZTNs) represent a transformative paradigm toward fully automated and intelligent network management, providing the scalability and adaptability required for the complexity of sixth-generation (6G) networks. However, the distributed architecture, high openness, and deep heterogeneity of 6G networks expand the attack surface and pose unprecedented security challenges. To address this, security automation aims to enable intelligent security management across dynamic and complex environments, serving as a key capability for securing 6G ZTNs. Despite its promise, implementing security automation in 6G ZTNs presents two primary challenges: 1) automating the lifecycle from security strategy generation to validation and update under real-world, parallel, and adversarial conditions, and 2) adapting security strategies to evolving threats and dynamic environments. This motivates us to propose SecLoop and SA-GRPO. SecLoop constitutes the first fully automated framework that integrates large language models (LLMs) across the entire lifecycle of security strategy generation, orchestration, response, and feedback, enabling intelligent and adaptive defenses in dynamic network environments, thus tackling the first challenge. Furthermore, we propose SA-GRPO, a novel security-aware group relative policy optimization algorithm that iteratively refines security strategies by contrasting group feedback collected from parallel SecLoop executions, thereby addressing the second challenge. Extensive real-world experiments on five benchmarks, including 11 MITRE ATT&CK processes and over 20 types of attacks, demonstrate the superiority of the proposed SecLoop and SA-GRPO. We will release our platform to the community, facilitating the advancement of security automation towards next generation communications.

Paper Structure

This paper contains 34 sections, 25 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of various attacks in zero-touch networks. Zero-touch networks introduce the software defined network (SDN) framework and automated management. The openness of 6G ZTN is accompanied with various attacks, such as DDoS, SQL injection, and man-in-the-middle (MITM) attacks.
  • Figure 2: Illustration of our framework mapped to ETSI ZSM. Data from the Parallel BATTLE-FIELD is fed into the policy execution validator. The analysis results from the validator are then delivered to the LLM-Based Agent, which generates strategic strategies. These strategies are executed by the SOC, and the final results are continuously monitored and recollected by the Parallel BATTLE-FIELD for subsequent cycles, constituting a closed-loop framework.
  • Figure 3: Illustration of our proposed SecLoop. The system inputs are attack alerts from the AutoAttack dataset, which is fed into the LLM-based agent. LLM agent generates a group of strategies, which are then input into the SOC for execution. The red team controller and blue team controller automatically generate parallel BATTLE-FIELD environments to carry out the corresponding tool invocations. The response of BATTLE-FIELD is processed by the policy execution validator to generate feedback values, which are passed to the SA-GRPO reward function. SA-GRPO iteratively optimizes and updates the model parameters.
  • Figure 4: Illustration of our proposed SA-GRPO algorithm. A group of outputs is sampled from the policy model, each assigned a reward computed with four customized reward functions. The group relative advantage is estimated for each output, and the policy model is updated by maximizing the objective function.
  • Figure 5: Performance comparisons and ablation studies of our proposed SecLoop and SA-GRPO. (a), (b), and (c) compare the accuracy of SecLoop and SA-GRPO against multiple baselines on five benchmarks. (d), (e), and (f) evaluate the accuracy of SecLoop and SA-GRPO in the real-world testbed.
  • ...and 2 more figures