Table of Contents
Fetching ...

Unveiling Hidden Links Between Unseen Security Entities

Daniel Alfasi, Tal Shapira, Anat Bremler Barr

TL;DR

VulnScopper addresses the challenge of rapidly linking CVEs to CWEs and CPEs in ever-growing vulnerability databases by integrating a vulnerability-focused knowledge graph with NLP-derived entity descriptions. Built on ULTRA for graph reasoning and OpenAI's Ada embeddings for textual descriptions, it achieves strong transductive and inductive link-prediction performance on NVD and Red Hat datasets, including up to $78\%$ $Hits@10$ for CVE-CPE/CWE links and an $11.7\%$ improvement over large language models for CWE labeling. The system demonstrates the ability to uncover new product links to vulnerabilities (even unseen CVEs) and can significantly shorten remediation timelines, as shown in case studies of 2023 vulnerabilities. Overall, VulnScopper represents an effective AI-assisted vulnerability scoping framework that blends graph-structure reasoning with descriptive language understanding to enhance vulnerability management in practice.

Abstract

The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM), VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches. We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database. Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. Our model can uncover new products linked to vulnerabilities, reducing remediation time and improving vulnerability management. We analyzed several CVEs from 2023 to showcase this ability.

Unveiling Hidden Links Between Unseen Security Entities

TL;DR

VulnScopper addresses the challenge of rapidly linking CVEs to CWEs and CPEs in ever-growing vulnerability databases by integrating a vulnerability-focused knowledge graph with NLP-derived entity descriptions. Built on ULTRA for graph reasoning and OpenAI's Ada embeddings for textual descriptions, it achieves strong transductive and inductive link-prediction performance on NVD and Red Hat datasets, including up to for CVE-CPE/CWE links and an improvement over large language models for CWE labeling. The system demonstrates the ability to uncover new product links to vulnerabilities (even unseen CVEs) and can significantly shorten remediation timelines, as shown in case studies of 2023 vulnerabilities. Overall, VulnScopper represents an effective AI-assisted vulnerability scoping framework that blends graph-structure reasoning with descriptive language understanding to enhance vulnerability management in practice.

Abstract

The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM), VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches. We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database. Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. Our model can uncover new products linked to vulnerabilities, reducing remediation time and improving vulnerability management. We analyzed several CVEs from 2023 to showcase this ability.
Paper Structure (37 sections, 8 equations, 6 figures, 18 tables)

This paper contains 37 sections, 8 equations, 6 figures, 18 tables.

Figures (6)

  • Figure 1: CWE-200 and cwe-400 from cwe.mitre.org.
  • Figure 2: ULTRA's Relation graph construction step on our vulnerability KG.
  • Figure 3: CVE-2023-4863 vulnerability description, NVD.
  • Figure 4: The vulnerability knowledge graph schema used by VulnScopper .
  • Figure 5: Number of relations that are added to each graph per year.
  • ...and 1 more figures