From Sands to Mansions: Towards Automated Cyberattack Emulation with Classical Planning and Large Language Models
Lingzhi Wang, Zhenyuan Li, Yi Jiang, Zhengkai Wang, Zonghan Guo, Jiahui Wang, Yangyang Wei, Xiangmin Shen, Wei Ruan, Yan Chen
TL;DR
This work tackles the lack of up-to-date, diverse, and realistic cyberattack datasets by introducing Aurora, an automated cyberattack emulation system that combines a modular Attack Action Linking Model (AALM) with classical planning and large language models. Aurora ingests third-party attack tools and CTI reports to automatically generate multi-step attack chains, builds corresponding emulation environments, and semi-automatically executes attacks, yielding a large, reproducible dataset of attack chains (over 1,000 published chains from more than 5,500 actions). The approach uses PDDL domain/problem generation with a reward mechanism aligned to CTI reports, enabling chains that reflect real attacker behavior and supporting defense benchmarking. Evaluations show Aurora outperforms baselines and advanced generative AI in chain quality, diversity, and CTI alignment, while also offering economical time and cost characteristics and public release of datasets and code to accelerate further research. The work highlights practical impact for threat-informed defense, dataset reproducibility, and benchmarking across detection systems, while acknowledging limitations in LLM predicate accuracy, partial automation, and environment customization with VM images, pointing to directions for future improvement.
Abstract
As attackers continually advance their tools, skills, and techniques during cyberattacks - particularly in modern Advanced Persistence Threats (APT) campaigns - there is a pressing need for a comprehensive and up-to-date cyberattack dataset to support threat-informed defense and enable benchmarking of defense systems in both academia and commercial solutions. However, there is a noticeable scarcity of cyberattack datasets: recent academic studies continue to rely on outdated benchmarks, while cyberattack emulation in industry remains limited due to the significant human effort and expertise required. Creating datasets by emulating advanced cyberattacks presents several challenges, such as limited coverage of attack techniques, the complexity of chaining multiple attack steps, and the difficulty of realistically mimicking actual threat groups. In this paper, we introduce modularized Attack Action and Attack Action Linking Model as a structured way to organizing and chaining individual attack steps into multi-step cyberattacks. Building on this, we propose Aurora, a system that autonomously emulates cyberattacks using third-party attack tools and threat intelligence reports with the help of classical planning and large language models. Aurora can automatically generate detailed attack plans, set up emulation environments, and semi-automatically execute the attacks. We utilize Aurora to create a dataset containing over 1,000 attack chains. To our best knowledge, Aurora is the only system capable of automatically constructing such a large-scale cyberattack dataset with corresponding attack execution scripts and environments. Our evaluation further demonstrates that Aurora outperforms the previous similar work and even the most advanced generative AI models in cyberattack emulation. To support further research, we published the cyberattack dataset and will publish the source code of Aurora.
