TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Zijing Xu; Ziwei Ning; Tiancheng Hu; Jianwei Zhuge; Yangyang Wang; Jiahao Cao; Mingwei Xu

TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Zijing Xu, Ziwei Ning, Tiancheng Hu, Jianwei Zhuge, Yangyang Wang, Jiahao Cao, Mingwei Xu

TL;DR

TRACE presents a framework to address timeliness and coverage gaps in cybersecurity knowledge graphs by unifying 24 structured data sources with 3 categories of unstructured data. It defines a generalized cybersecurity ontology and uses LLMs with retrieval-augmented generation to extract and align entities, enabling continuous, near-real-time expansion of the CKG. The approach yields a large-scale graph (56 node types, 112 edge types) with substantial gains in coverage ($1.82\times$ over prior graphs) and competitive extraction accuracy ($86.08\%$ precision, $76.92\%$ recall, $81.24\%$ F1) compared to baselines, while demonstrating strong entity alignment and practical utility via case studies. The work enables threat hunters to obtain comprehensive, up-to-date insights into vulnerabilities, attack methods, and defensive technologies, supporting proactive cyber risk management, with future work focusing on reducing isolated nodes, improving prompt design, and incorporating multimodal data.

Abstract

The rapid evolution of cyber threats has highlighted significant gaps in security knowledge integration. Cybersecurity Knowledge Graphs (CKGs) relying on structured data inherently exhibit hysteresis, as the timely incorporation of rapidly evolving unstructured data remains limited, potentially leading to the omission of critical insights for risk analysis. To address these limitations, we introduce TRACE, a framework designed to integrate structured and unstructured cybersecurity data sources. TRACE integrates knowledge from 24 structured databases and 3 categories of unstructured data, including APT reports, papers, and repair notices. Leveraging Large Language Models (LLMs), TRACE facilitates efficient entity extraction and alignment, enabling continuous updates to the CKG. Evaluations demonstrate that TRACE achieves a 1.8x increase in node coverage compared to existing CKGs. TRACE attains the precision of 86.08%, the recall of 76.92%, and the F1 score of 81.24% in entity extraction, surpassing the best-known LLM-based baselines by 7.8%. Furthermore, our entity alignment methods effectively harmonize entities with existing knowledge structures, enhancing the integrity and utility of the CKG. With TRACE, threat hunters and attack analysts gain real-time, holistic insights into vulnerabilities, attack methods, and defense technologies.

TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

TL;DR

over prior graphs) and competitive extraction accuracy (

precision,

recall,

F1) compared to baselines, while demonstrating strong entity alignment and practical utility via case studies. The work enables threat hunters to obtain comprehensive, up-to-date insights into vulnerabilities, attack methods, and defensive technologies, supporting proactive cyber risk management, with future work focusing on reducing isolated nodes, improving prompt design, and incorporating multimodal data.

Abstract

Paper Structure (26 sections, 2 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 2 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Cybersecurity Ontology
Cybersecurity Knowledge Graph
System Architecture
Construction of the CKG
Entity Types
Relation Types
Collection and Processing
Structured Information Acquisition
Unstructured Information Acquisition
Filtering and Validation
Entity Standardization and Alignment
Implementation
Experiments
...and 11 more sections

Figures (4)

Figure 1: Framework of TRACE
Figure 2: F1-score comparison of triple extraction in unstructured data sources across LLMs.
Figure 3: A sample of attack analysis using TRACE
Figure 4: F1-score comparison of entity alignment in unstructured data sources across LLMs.

TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

TL;DR

Abstract

TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Authors

TL;DR

Abstract

Table of Contents

Figures (4)