Uncovering CWE-CVE-CPE Relations with Threat Knowledge Graphs
Zhenpeng Shi, Nikolay Matyunin, Kalman Graffi, David Starobinski
TL;DR
This work introduces threat knowledge graphs that fuse CVE, CWE, and CPE databases to enable link prediction across products, vulnerabilities, and weaknesses. It evaluates multiple KG embedding models, finding TransE to be the strongest for predicting CVE–CPE and CVE–CWE links in both closed- and open-world settings, and demonstrates predictive utility over time using historical data. The authors show that data optimizations (merging CPEs, removing unconnected entries) and data enrichments (CAPEC, CVSS vectors) can improve open-world prediction performance, particularly for CVE–CWE predictions, while maintaining solid closed-world performance. These results suggest practical use in threat modeling and vulnerability management, enabling proactive identification of plausible, future associations for prioritized verification and mitigation.
Abstract
Security assessment relies on public information about products, vulnerabilities, and weaknesses. So far, databases in these categories have rarely been analyzed in combination. Yet, doing so could help predict unreported vulnerabilities and identify common threat patterns. In this paper, we propose a methodology for producing and optimizing a knowledge graph that aggregates knowledge from common threat databases (CVE, CWE, and CPE). We apply the threat knowledge graph to predict associations between threat databases, specifically between products, vulnerabilities, and weaknesses. We evaluate the prediction performance both in closed world with associations from the knowledge graph, and in open world with associations revealed afterward. Using rank-based metrics (i.e., Mean Rank, Mean Reciprocal Rank, and Hits@N scores), we demonstrate the ability of the threat knowledge graph to uncover many associations that are currently unknown but will be revealed in the future, which remains useful over different time periods. We propose approaches to optimize the knowledge graph, and show that they indeed help in further uncovering associations.
