Table of Contents
Fetching ...

Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs

Zhenpeng Shi, Nikolay Matyunin, Kalman Graffi, David Starobinski

TL;DR

This work introduces threat knowledge graphs that fuse CVE, CWE, and CPE databases to enable link prediction across products, vulnerabilities, and weaknesses. It evaluates multiple KG embedding models, finding TransE to be the strongest for predicting CVE–CPE and CVE–CWE links in both closed- and open-world settings, and demonstrates predictive utility over time using historical data. The authors show that data optimizations (merging CPEs, removing unconnected entries) and data enrichments (CAPEC, CVSS vectors) can improve open-world prediction performance, particularly for CVE–CWE predictions, while maintaining solid closed-world performance. These results suggest practical use in threat modeling and vulnerability management, enabling proactive identification of plausible, future associations for prioritized verification and mitigation.

Abstract

Security assessment relies on public information about products, vulnerabilities, and weaknesses. So far, databases in these categories have rarely been analyzed in combination. Yet, doing so could help predict unreported vulnerabilities and identify common threat patterns. In this paper, we propose a methodology for producing and optimizing a knowledge graph that aggregates knowledge from common threat databases (CVE, CWE, and CPE). We apply the threat knowledge graph to predict associations between threat databases, specifically between products, vulnerabilities, and weaknesses. We evaluate the prediction performance both in closed world with associations from the knowledge graph, and in open world with associations revealed afterward. Using rank-based metrics (i.e., Mean Rank, Mean Reciprocal Rank, and Hits@N scores), we demonstrate the ability of the threat knowledge graph to uncover many associations that are currently unknown but will be revealed in the future, which remains useful over different time periods. We propose approaches to optimize the knowledge graph, and show that they indeed help in further uncovering associations.

Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs

TL;DR

This work introduces threat knowledge graphs that fuse CVE, CWE, and CPE databases to enable link prediction across products, vulnerabilities, and weaknesses. It evaluates multiple KG embedding models, finding TransE to be the strongest for predicting CVE–CPE and CVE–CWE links in both closed- and open-world settings, and demonstrates predictive utility over time using historical data. The authors show that data optimizations (merging CPEs, removing unconnected entries) and data enrichments (CAPEC, CVSS vectors) can improve open-world prediction performance, particularly for CVE–CWE predictions, while maintaining solid closed-world performance. These results suggest practical use in threat modeling and vulnerability management, enabling proactive identification of plausible, future associations for prioritized verification and mitigation.

Abstract

Security assessment relies on public information about products, vulnerabilities, and weaknesses. So far, databases in these categories have rarely been analyzed in combination. Yet, doing so could help predict unreported vulnerabilities and identify common threat patterns. In this paper, we propose a methodology for producing and optimizing a knowledge graph that aggregates knowledge from common threat databases (CVE, CWE, and CPE). We apply the threat knowledge graph to predict associations between threat databases, specifically between products, vulnerabilities, and weaknesses. We evaluate the prediction performance both in closed world with associations from the knowledge graph, and in open world with associations revealed afterward. Using rank-based metrics (i.e., Mean Rank, Mean Reciprocal Rank, and Hits@N scores), we demonstrate the ability of the threat knowledge graph to uncover many associations that are currently unknown but will be revealed in the future, which remains useful over different time periods. We propose approaches to optimize the knowledge graph, and show that they indeed help in further uncovering associations.
Paper Structure (22 sections, 1 equation, 6 figures, 15 tables)

This paper contains 22 sections, 1 equation, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Illustrations of the prediction process. For any given CVE (e.g., CVE-2021-21348), and its known associations to CPEs and CWEs at a certain date (left side), we aim to predict future associations to other CPEs and CWEs (right side). The cloud at the bottom represents the rest of CPE/CVE/CWE entries and the associations between them. The blue lines represent successfully predicted associations (true positives), and the red dashed lines represent failed predictions due to false positives or false negatives.
  • Figure 2: Complete structure of the threat knowledge graph. (a, b, x, y, z1-z4 represent the IDs of the corresponding entries.)
  • Figure 3: Illustration of knowledge graph embedding. The CPE, CVE, and CWE entries shown in Fig. \ref{['kg']} are embedded as vectors in a 200-dimensional vector space, and then projected onto a 2-dimensional space by principal component analysis for illustration. Note that the relations and attributes in Fig. \ref{['kg']} are also embedded as vectors, but not shown here.
  • Figure 4: The Hits@N scores for predicting new associations, with the test set selected from different time periods. Note that only the end time of the periods is different (marked by different colors, and all in 2022), while the start time is Aug 2021 for all three groups. In general, the Hits@N scores increase as the test set includes more newly added triples.
  • Figure 5: Precision-recall curve of predicting CVE-CPE associations using threat knowledge graph. The curve reflects the trade-off between precision and recall as the threshold $\alpha$ for positive prediction changes. When $\alpha$ decreases, the precision decreases and the recall increases. A reference point based on the maximized F1-score is marked in red.
  • ...and 1 more figures