VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection
Samal Mukhtar, Yinghua Yao, Zhu Sun, Mustafa Mustafa, Yew Soon Ong, Youcheng Sun
TL;DR
This work addresses the limitation of binary software vulnerability detection by targeting CWE-level reasoning through a knowledge-graph backbone. It introduces VulReaD, which distills CWE-consistent reasoning from a strong teacher LLM using KG-grounded data and trains a student model with Odds Ratio Preference Optimization to favor CWE-aligned explanations. Across three real-world datasets, VulReaD outperforms traditional DL baselines and existing LLM methods in both binary detection and multi-class CWE classification, with KG grounding driving notable gains in CWE coverage and interpretability. The approach enhances trust and actionable insight for developers by tying explanations to standardized vulnerability taxonomies and structured knowledge, while revealing practical considerations and trade-offs in KG design and large-model training.
Abstract
Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.
