Table of Contents
Fetching ...

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

Ziyou Jiang, Lin Shi, Guowei Yang, Xuyan Ma, Fenglong Li, Qing Wang

TL;DR

SynAT addresses the lag between emergent security risks and official knowledge bases by automatically synthesizing attack trees from crowd security posts. It restricts sentence scope with LLM-based prompts, jointly extracts attack events and relations through a transition-based framework, and builds attack trees via heuristic rules. Across 5,070 Stack Overflow posts (augmented to 18,203) and 1,354 ground-truth trees, SynAT achieves state-of-the-art event extraction (F1 ≈ 0.809) and relation extraction (F1 ≈ 0.878), with superior tree similarity (AHD ≈ 0.1024, TEDS ≈ 0.0793). It also demonstrates practical impact by enhancing public KBs CVE/CAPEC and Huawei’s private KB, and by informing security-patch design; future work will extend to additional crowd sources and incorporate mitigations into the synthesized trees.

Abstract

Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI's security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT's practicality.

SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions

TL;DR

SynAT addresses the lag between emergent security risks and official knowledge bases by automatically synthesizing attack trees from crowd security posts. It restricts sentence scope with LLM-based prompts, jointly extracts attack events and relations through a transition-based framework, and builds attack trees via heuristic rules. Across 5,070 Stack Overflow posts (augmented to 18,203) and 1,354 ground-truth trees, SynAT achieves state-of-the-art event extraction (F1 ≈ 0.809) and relation extraction (F1 ≈ 0.878), with superior tree similarity (AHD ≈ 0.1024, TEDS ≈ 0.0793). It also demonstrates practical impact by enhancing public KBs CVE/CAPEC and Huawei’s private KB, and by informing security-patch design; future work will extend to additional crowd sources and incorporate mitigations into the synthesized trees.

Abstract

Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI's security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT's practicality.
Paper Structure (39 sections, 6 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 39 sections, 6 equations, 8 figures, 10 tables, 1 algorithm.

Figures (8)

  • Figure 1: The motivation example of attack tree synthesizing in security post #35817325.
  • Figure 2: The meta-framework of attack trees.
  • Figure 3: The extracted attack events and relations of motivation example in Fig. \ref{['fig:motivation']}.
  • Figure 4: The architecture of SynAT.
  • Figure 5: The example of how the transition-based model performs to extract the events and relations from the restricted scope of sentences in motivation example (Fig. \ref{['fig:motivation']}).
  • ...and 3 more figures