Table of Contents
Fetching ...

ReVul-CoT: Towards Effective Software Vulnerability Assessment with Retrieval-Augmented Generation and Chain-of-Thought Prompting

Zhijie Chen, Xiang Chen, Ziming Li, Jiacheng Xue, Chaoyang Gao

TL;DR

This work tackles the limitations of LLM-based software vulnerability assessment (SVA) by addressing domain knowledge gaps and shallow reasoning. It introduces ReVul-CoT, a retrieval-augmented generation framework that uses a local knowledge base built from NVD and CWE, along with CVSS v3, and a DeepSeek-V3.1 backbone to enable structured, step-by-step reasoning via Chain-of-Thought prompting. The approach dynamically retrieves relevant external knowledge, fuses code and description modalities, and applies CoT to produce CVSS v3–based severity predictions, achieving substantial gains over state-of-the-art baselines on a 12,070-vulnerability dataset. The results demonstrate improved accuracy, robustness, and interpretability, highlighting a promising direction for integrating retrieval and reasoning in automated SVA and suggesting avenues for future expansion to more languages and broader security tasks.

Abstract

Context: Software Vulnerability Assessment (SVA) plays a vital role in evaluating and ranking vulnerabilities in software systems to ensure their security and reliability. Objective: Although Large Language Models (LLMs) have recently shown remarkable potential in SVA, they still face two major limitations. First, most LLMs are trained on general-purpose corpora and thus lack domain-specific knowledge essential for effective SVA. Second, they tend to rely on shallow pattern matching instead of deep contextual reasoning, making it challenging to fully comprehend complex code semantics and their security implications. Method: To alleviate these limitations, we propose a novel framework ReVul-CoT that integrates Retrieval-Augmented Generation (RAG) with Chain-of-Thought (COT) prompting. In ReVul-CoT, the RAG module dynamically retrieves contextually relevant information from a constructed local knowledge base that consolidates vulnerability data from authoritative sources (such as NVD and CWE), along with corresponding code snippets and descriptive information. Building on DeepSeek-V3.1, CoT prompting guides the LLM to perform step-by-step reasoning over exploitability, impact scope, and related factors Results: We evaluate ReVul-CoT on a dataset of 12,070 vulnerabilities. Experimental results show that ReVul-CoT outperforms state-of-the-art SVA baselines by 16.50%-42.26% in terms of MCC, and outperforms the best baseline by 10.43%, 15.86%, and 16.50% in Accuracy, F1-score, and MCC, respectively. Our ablation studies further validate the contributions of considering dynamic retrieval, knowledge integration, and CoT-based reasoning. Conclusion: Our results demonstrate that combining RAG with CoT prompting significantly enhances LLM-based SVA and points out promising directions for future research.

ReVul-CoT: Towards Effective Software Vulnerability Assessment with Retrieval-Augmented Generation and Chain-of-Thought Prompting

TL;DR

This work tackles the limitations of LLM-based software vulnerability assessment (SVA) by addressing domain knowledge gaps and shallow reasoning. It introduces ReVul-CoT, a retrieval-augmented generation framework that uses a local knowledge base built from NVD and CWE, along with CVSS v3, and a DeepSeek-V3.1 backbone to enable structured, step-by-step reasoning via Chain-of-Thought prompting. The approach dynamically retrieves relevant external knowledge, fuses code and description modalities, and applies CoT to produce CVSS v3–based severity predictions, achieving substantial gains over state-of-the-art baselines on a 12,070-vulnerability dataset. The results demonstrate improved accuracy, robustness, and interpretability, highlighting a promising direction for integrating retrieval and reasoning in automated SVA and suggesting avenues for future expansion to more languages and broader security tasks.

Abstract

Context: Software Vulnerability Assessment (SVA) plays a vital role in evaluating and ranking vulnerabilities in software systems to ensure their security and reliability. Objective: Although Large Language Models (LLMs) have recently shown remarkable potential in SVA, they still face two major limitations. First, most LLMs are trained on general-purpose corpora and thus lack domain-specific knowledge essential for effective SVA. Second, they tend to rely on shallow pattern matching instead of deep contextual reasoning, making it challenging to fully comprehend complex code semantics and their security implications. Method: To alleviate these limitations, we propose a novel framework ReVul-CoT that integrates Retrieval-Augmented Generation (RAG) with Chain-of-Thought (COT) prompting. In ReVul-CoT, the RAG module dynamically retrieves contextually relevant information from a constructed local knowledge base that consolidates vulnerability data from authoritative sources (such as NVD and CWE), along with corresponding code snippets and descriptive information. Building on DeepSeek-V3.1, CoT prompting guides the LLM to perform step-by-step reasoning over exploitability, impact scope, and related factors Results: We evaluate ReVul-CoT on a dataset of 12,070 vulnerabilities. Experimental results show that ReVul-CoT outperforms state-of-the-art SVA baselines by 16.50%-42.26% in terms of MCC, and outperforms the best baseline by 10.43%, 15.86%, and 16.50% in Accuracy, F1-score, and MCC, respectively. Our ablation studies further validate the contributions of considering dynamic retrieval, knowledge integration, and CoT-based reasoning. Conclusion: Our results demonstrate that combining RAG with CoT prompting significantly enhances LLM-based SVA and points out promising directions for future research.

Paper Structure

This paper contains 31 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison between standard single-step prompting and Chain-of-Thought prompting in software vulnerability severity assessment.
  • Figure 2: Framework of our proposed approach ReVul-CoT
  • Figure 3: The CoT prompt template utilized by ReVul-CoT.
  • Figure 4: Two representative cases of the base severity predicted by our proposed ReVul-CoT under different similarity settings (i.e., considering only source code, only vulnerability descriptions, and both with the best ratio).
  • Figure 5: Distribution of total input tokens in the framework ReVul-CoT based on DeepSeek-V3.1, notice the histogram shows the total token consumption per input sample.