Table of Contents
Fetching ...

AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang, Yuliang Lu, Ee-Chien Chang

TL;DR

The paper addresses automatic construction of attack knowledge graphs from CTI reports using Large Language Models to overcome limited generalization and accessibility barriers in prior methods. It introduces AttacKG+, a four-module framework with a new multi-layer threat knowledge schema (threat behavior, TTP labels, and state summary) and demonstrates its effectiveness on 500 CTI reports and 234 MITRE techniques. Results show that AttacKG+ outperforms baselines in threat behavior graph extraction and technique identification, delivering higher precision, recall, and F1 and enabling more accurate threat reconstruction for security operations. The work also contributes two CTI datasets, Re-CTI and CTI-TE, and provides visualization tools, underscoring practical impact for threat analysis and defensive decision-making.

Abstract

Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.

AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

TL;DR

The paper addresses automatic construction of attack knowledge graphs from CTI reports using Large Language Models to overcome limited generalization and accessibility barriers in prior methods. It introduces AttacKG+, a four-module framework with a new multi-layer threat knowledge schema (threat behavior, TTP labels, and state summary) and demonstrates its effectiveness on 500 CTI reports and 234 MITRE techniques. Results show that AttacKG+ outperforms baselines in threat behavior graph extraction and technique identification, delivering higher precision, recall, and F1 and enabling more accurate threat reconstruction for security operations. The work also contributes two CTI datasets, Re-CTI and CTI-TE, and provides visualization tools, underscoring practical impact for threat analysis and defensive decision-making.

Abstract

Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.
Paper Structure (20 sections, 5 figures, 2 tables)

This paper contains 20 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overview of AttacKG+'s knowledge scheme. We formulate it as a temporally unfolding complex event, each temporal step of which consists of three layers of representation: Beharivor Graph [mid-layer], TTP Labels [top-layer], and State Summary [bottom-layer].
  • Figure 2: The overall framework of our AttacKG+ consists of four components: 1) Rewriter, which enables tactical rewriting of threat intelligence; 2) Parser, which extracts threat entities and relations from threat intelligence; 3) Identifier, which identifies patterns of attack techniques used in cyber threat intelligence; and 4) Summarizer, which performs a stage-by-stage situational summary of threat intelligence.
  • Figure 3: Manual assessment results of the report (tactical rewrite version).
  • Figure 4: Example of complex threat event extraction. 1) the distribution of threat entities and relations. 2) the use of cyber attack techniques. 3) the stage state summary situation, containing permissions, files, tools, and information.
  • Figure 5: Examples of the visualization of AttacKG+