AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models
Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang, Yuliang Lu, Ee-Chien Chang
TL;DR
The paper addresses automatic construction of attack knowledge graphs from CTI reports using Large Language Models to overcome limited generalization and accessibility barriers in prior methods. It introduces AttacKG+, a four-module framework with a new multi-layer threat knowledge schema (threat behavior, TTP labels, and state summary) and demonstrates its effectiveness on 500 CTI reports and 234 MITRE techniques. Results show that AttacKG+ outperforms baselines in threat behavior graph extraction and technique identification, delivering higher precision, recall, and F1 and enabling more accurate threat reconstruction for security operations. The work also contributes two CTI datasets, Re-CTI and CTI-TE, and provides visualization tools, underscoring practical impact for threat analysis and defensive decision-making.
Abstract
Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.
