Table of Contents
Fetching ...

CRUcialG: Reconstruct Integrated Attack Scenario Graphs by Cyber Threat Intelligence Reports

Wenrui Cheng, Tiantian Zhu, Tieming Chen, Qixuan Yuan, Jie Ying, Hongmei Li, Chunlin Xiong, Mingda Li, Mingqi Lv, Yan Chen

TL;DR

A system called CRUcialG is proposed for the automated reconstruction of Attack Scenario Graphs (ASGs) by CTI reports using NLP models to extract systematic attack knowledge from CTI reports to form preliminary ASGs and a four-phase attack rationality validation framework from the tactical phase with attack procedure to evaluate the reasonability of ASGs is proposed.

Abstract

Cyber Threat Intelligence (CTI) reports are factual records compiled by security analysts through their observations of threat events or their own practical experience with attacks. In order to utilize CTI reports for attack detection, existing methods have attempted to map the content of reports onto system-level attack provenance graphs to clearly depict attack procedures. However, existing studies on constructing graphs from CTI reports suffer from problems such as weak natural language processing (NLP) capabilities, discrete and fragmented graphs, and insufficient attack semantic representation. Therefore, we propose a system called CRUcialG for the automated reconstruction of attack scenario graphs (ASGs) by CTI reports. First, we use NLP models to extract systematic attack knowledge from CTI reports to form preliminary ASGs. Then, we propose a four-phase attack rationality verification framework from the tactical phase with attack procedure to evaluate the reasonability of ASGs. Finally, we implement the relation repair and phase supplement of ASGs by adopting a serialized graph generation model. We collect a total of 10,607 CTI reports and generate 5,761 complete ASGs. Experimental results on CTI reports from 30 security vendors and DARPA show that the similarity of ASG reconstruction by CRUcialG can reach 84.54%. Compared with SOTA (EXTRACTOR and AttackG), the recall of CRUcialG (extraction of real attack events) can reach 88.13% and 94.46% respectively, which is 40% higher than SOTA on average. The F1-score of attack phase verification is able to reach 90.04%.

CRUcialG: Reconstruct Integrated Attack Scenario Graphs by Cyber Threat Intelligence Reports

TL;DR

A system called CRUcialG is proposed for the automated reconstruction of Attack Scenario Graphs (ASGs) by CTI reports using NLP models to extract systematic attack knowledge from CTI reports to form preliminary ASGs and a four-phase attack rationality validation framework from the tactical phase with attack procedure to evaluate the reasonability of ASGs is proposed.

Abstract

Cyber Threat Intelligence (CTI) reports are factual records compiled by security analysts through their observations of threat events or their own practical experience with attacks. In order to utilize CTI reports for attack detection, existing methods have attempted to map the content of reports onto system-level attack provenance graphs to clearly depict attack procedures. However, existing studies on constructing graphs from CTI reports suffer from problems such as weak natural language processing (NLP) capabilities, discrete and fragmented graphs, and insufficient attack semantic representation. Therefore, we propose a system called CRUcialG for the automated reconstruction of attack scenario graphs (ASGs) by CTI reports. First, we use NLP models to extract systematic attack knowledge from CTI reports to form preliminary ASGs. Then, we propose a four-phase attack rationality verification framework from the tactical phase with attack procedure to evaluate the reasonability of ASGs. Finally, we implement the relation repair and phase supplement of ASGs by adopting a serialized graph generation model. We collect a total of 10,607 CTI reports and generate 5,761 complete ASGs. Experimental results on CTI reports from 30 security vendors and DARPA show that the similarity of ASG reconstruction by CRUcialG can reach 84.54%. Compared with SOTA (EXTRACTOR and AttackG), the recall of CRUcialG (extraction of real attack events) can reach 88.13% and 94.46% respectively, which is 40% higher than SOTA on average. The F1-score of attack phase verification is able to reach 90.04%.

Paper Structure

This paper contains 26 sections, 5 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: A motivating example. Subfigure (A) and Subfigure (B) are the attack graphs constructed from the AsyncRAT AsyncRAT report by EXTRACTOR and AttackG respectively. Subfigure (C) shows the attack scenario graph (omit some nodes and edges) reconstructed by CRUcialG. The key events matched in the four attack phases are marked in green, blue, red, and orange, respectively (will be mentioned in Section \ref{['sec:sysdesign_rational']}). Below the graph is a brief description of the report, which identifies the key entities extracted. The complete redundancy-filtered text can be found in Appendix \ref{['appendix:AsyncRAT']}.
  • Figure 2: The architecture of CRUcialG. First, CRUcialG extracts attack knowledge and builds a graph from CTI reports. Second, the attack rationality verification framework is used to judge whether the ASG is rational. Finally, the ASG is repaired and supplemented by the graph generation model.
  • Figure 3: Tactics statistics are derived from the frequency of occurrence of each tactic in 1,000 CTI reports.
  • Figure 4: Precision, recall and F1-score of the attack four-phase covered by ASGs which are reconstructed from 800 open CTI reports.
  • Figure 5: Results of the decision (one-class) classification model’s determination of KARIOS’s 62 alarms (37 false positives/FPs and 25 true positives/TPs). Precision equals correctly identified FPs divided by model-identified FPs (in one-class classification model, TPs divided by sum of TP and FP), and recall equals correctly identified FPs divided by true FPs (37) in KAIROS (in one-class classification model, TPs divided by sum of TP and FN).