Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

Xinye Cao; Yihan Lin; Guoshun Nan; Qinchuan Zhou; Yuhang Luo; Yurui Gao; Zeliang Zhang; Haolang Lu; Qimei Cui; Yanzhao Hou; Xiaofeng Tao; Tony Q. S. Quek

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

Xinye Cao, Yihan Lin, Guoshun Nan, Qinchuan Zhou, Yuhang Luo, Yurui Gao, Zeliang Zhang, Haolang Lu, Qimei Cui, Yanzhao Hou, Xiaofeng Tao, Tony Q. S. Quek

TL;DR

<3-5 sentence high-level summary> SecLoop introduces an end-to-end, LLM-driven security automation framework for 6G Zero-Touch Networks, integrating strategy generation, orchestration, execution, and feedback within a parallelized BATTLE-FIELD testbed. SA-GRPO, a security-aware group relative policy optimization algorithm, leverages parallel environment feedback to refine strategy groups without requiring large labeled datasets, using a multi-component reward design. The approach is validated through extensive benchmarks, real-world edge-device tests, and case studies, demonstrating superior accuracy and adaptability across heterogeneous environments. The work advances practical security automation for next-generation networks and provides datasets and infrastructure for community use.

Abstract

Zero-Touch Networks (ZTNs) represent a transformative paradigm toward fully automated and intelligent network management, providing the scalability and adaptability required for the complexity of sixth-generation (6G) networks. However, the distributed architecture, high openness, and deep heterogeneity of 6G networks expand the attack surface and pose unprecedented security challenges. To address this, security automation aims to enable intelligent security management across dynamic and complex environments, serving as a key capability for securing 6G ZTNs. Despite its promise, implementing security automation in 6G ZTNs presents two primary challenges: 1) automating the lifecycle from security strategy generation to validation and update under real-world, parallel, and adversarial conditions, and 2) adapting security strategies to evolving threats and dynamic environments. This motivates us to propose SecLoop and SA-GRPO. SecLoop constitutes the first fully automated framework that integrates large language models (LLMs) across the entire lifecycle of security strategy generation, orchestration, response, and feedback, enabling intelligent and adaptive defenses in dynamic network environments, thus tackling the first challenge. Furthermore, we propose SA-GRPO, a novel security-aware group relative policy optimization algorithm that iteratively refines security strategies by contrasting group feedback collected from parallel SecLoop executions, thereby addressing the second challenge. Extensive real-world experiments on five benchmarks, including 11 MITRE ATT&CK processes and over 20 types of attacks, demonstrate the superiority of the proposed SecLoop and SA-GRPO. We will release our platform to the community, facilitating the advancement of security automation towards next generation communications.

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

TL;DR

Abstract

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)