Table of Contents
Fetching ...

VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Samal Mukhtar, Yinghua Yao, Zhu Sun, Mustafa Mustafa, Yew Soon Ong, Youcheng Sun

TL;DR

This work addresses the limitation of binary software vulnerability detection by targeting CWE-level reasoning through a knowledge-graph backbone. It introduces VulReaD, which distills CWE-consistent reasoning from a strong teacher LLM using KG-grounded data and trains a student model with Odds Ratio Preference Optimization to favor CWE-aligned explanations. Across three real-world datasets, VulReaD outperforms traditional DL baselines and existing LLM methods in both binary detection and multi-class CWE classification, with KG grounding driving notable gains in CWE coverage and interpretability. The approach enhances trust and actionable insight for developers by tying explanations to standardized vulnerability taxonomies and structured knowledge, while revealing practical considerations and trade-offs in KG design and large-model training.

Abstract

Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.

VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

TL;DR

This work addresses the limitation of binary software vulnerability detection by targeting CWE-level reasoning through a knowledge-graph backbone. It introduces VulReaD, which distills CWE-consistent reasoning from a strong teacher LLM using KG-grounded data and trains a student model with Odds Ratio Preference Optimization to favor CWE-aligned explanations. Across three real-world datasets, VulReaD outperforms traditional DL baselines and existing LLM methods in both binary detection and multi-class CWE classification, with KG grounding driving notable gains in CWE coverage and interpretability. The approach enhances trust and actionable insight for developers by tying explanations to standardized vulnerability taxonomies and structured knowledge, while revealing practical considerations and trade-offs in KG design and large-model training.

Abstract

Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.
Paper Structure (53 sections, 5 equations, 3 figures, 7 tables)

This paper contains 53 sections, 5 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Failure case of baseline R2Vul on CWE-level reasoning. Given a function labeled CWE-401 (Memory Leak) (top), R2Vul generates a fluent rationale that incorrectly attributes the issue to CWE-763 (Use-After-Free) and introduces an unrelated CVE reference (bottom), illustrating semantic misalignment between generated explanations and the target CWE.
  • Figure 2: Training CWE Data Frequency vs. Reasoning-Based CWE Identification Accuracy.
  • Figure 3: Overview of VulReaD. The security knowledge graph (KG) provides structured vulnerability semantics (entities, abstraction classes, and CWE relations) that are retrieved to enrich each training example. A teacher LLM distills the dataset by generating a contrastive rationale pair: a CWE-consistent (valid) analysis and a deliberately CWE-inconsistent (flawed) analysis. A student model is then fine-tuned with ORPO, combining supervised fine-tuning on valid rationales ($L_{\text{SFT}}$) with an odds-ratio preference term ($L_{\text{OR}}$) to favor grounded, CWE-aligned reasoning over flawed alternatives.