Table of Contents
Fetching ...

Domain-constrained Synthesis of Inconsistent Key Aspects in Textual Vulnerability Descriptions

Linyi Han, Shidong Pan, Zhenchang Xing, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng, Jiamou Sun, Qing Huang

TL;DR

This paper tackles inconsistencies in Textual Vulnerability Descriptions (TVDs) across vulnerability repositories by introducing a domain-constrained synthesis framework that combines extraction, self-evaluation, and entropy-guided fusion. Central to the approach is Digest Labels (DLs), a nutrition-label-inspired visualization that standardizes and presents merged TVD content, improving comprehensiveness and usability. The method employs rule-based rewards, anchor-word constraints, and information entropy to retain critical details and reduce hallucinations, achieving an average F1 improvement to 0.87 and notable gains in practitioner usability. The work demonstrates strong reproducibility, shows practical benefits for vulnerability analysis tasks, and outlines pathways for downstream security applications using structured, domain-specific outputs.

Abstract

Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often discard valuable information and fail to synthesize comprehensive representations. In this paper, we propose a domain-constrained LLM-based synthesis framework for unifying key aspects of TVDs. Our framework consists of three stages: 1) Extraction, guided by rule-based templates to ensure all critical details are captured; 2) Self-evaluation, using domain-specific anchor words to assess semantic variability across sources; and 3) Fusion, leveraging information entropy to reconcile inconsistencies and prioritize relevant details. This framework improves synthesis performance, increasing the F1 score for key aspect augmentation from 0.82 to 0.87, while enhancing comprehension and efficiency by over 30\%. We further develop Digest Labels, a practical tool for visualizing TVDs, which human evaluations show significantly boosts usability.

Domain-constrained Synthesis of Inconsistent Key Aspects in Textual Vulnerability Descriptions

TL;DR

This paper tackles inconsistencies in Textual Vulnerability Descriptions (TVDs) across vulnerability repositories by introducing a domain-constrained synthesis framework that combines extraction, self-evaluation, and entropy-guided fusion. Central to the approach is Digest Labels (DLs), a nutrition-label-inspired visualization that standardizes and presents merged TVD content, improving comprehensiveness and usability. The method employs rule-based rewards, anchor-word constraints, and information entropy to retain critical details and reduce hallucinations, achieving an average F1 improvement to 0.87 and notable gains in practitioner usability. The work demonstrates strong reproducibility, shows practical benefits for vulnerability analysis tasks, and outlines pathways for downstream security applications using structured, domain-specific outputs.

Abstract

Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often discard valuable information and fail to synthesize comprehensive representations. In this paper, we propose a domain-constrained LLM-based synthesis framework for unifying key aspects of TVDs. Our framework consists of three stages: 1) Extraction, guided by rule-based templates to ensure all critical details are captured; 2) Self-evaluation, using domain-specific anchor words to assess semantic variability across sources; and 3) Fusion, leveraging information entropy to reconcile inconsistencies and prioritize relevant details. This framework improves synthesis performance, increasing the F1 score for key aspect augmentation from 0.82 to 0.87, while enhancing comprehension and efficiency by over 30\%. We further develop Digest Labels, a practical tool for visualizing TVDs, which human evaluations show significantly boosts usability.

Paper Structure

This paper contains 34 sections, 3 equations, 7 figures, 9 tables, 4 algorithms.

Figures (7)

  • Figure 1: Different repositories provide varying details. If synthesis loses details, the purpose of achieving comprehensive vulnerability understanding is undermined.
  • Figure 2: Missing rates on different vulnerability repositories.
  • Figure 3: The application scenario of Digest Labels (DLs). A) is the textual vulnerability description (TVD) available in a CVE repository. B) is the design of proposed DLs. C) is the Description section when the "CVE" is selected. D) is the explanations of Evaluation section.
  • Figure 4: The overview of framework.
  • Figure 5: Numbers proportion of key aspect values
  • ...and 2 more figures