Domain-constrained Synthesis of Inconsistent Key Aspects in Textual Vulnerability Descriptions
Linyi Han, Shidong Pan, Zhenchang Xing, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng, Jiamou Sun, Qing Huang
TL;DR
This paper tackles inconsistencies in Textual Vulnerability Descriptions (TVDs) across vulnerability repositories by introducing a domain-constrained synthesis framework that combines extraction, self-evaluation, and entropy-guided fusion. Central to the approach is Digest Labels (DLs), a nutrition-label-inspired visualization that standardizes and presents merged TVD content, improving comprehensiveness and usability. The method employs rule-based rewards, anchor-word constraints, and information entropy to retain critical details and reduce hallucinations, achieving an average F1 improvement to 0.87 and notable gains in practitioner usability. The work demonstrates strong reproducibility, shows practical benefits for vulnerability analysis tasks, and outlines pathways for downstream security applications using structured, domain-specific outputs.
Abstract
Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often discard valuable information and fail to synthesize comprehensive representations. In this paper, we propose a domain-constrained LLM-based synthesis framework for unifying key aspects of TVDs. Our framework consists of three stages: 1) Extraction, guided by rule-based templates to ensure all critical details are captured; 2) Self-evaluation, using domain-specific anchor words to assess semantic variability across sources; and 3) Fusion, leveraging information entropy to reconcile inconsistencies and prioritize relevant details. This framework improves synthesis performance, increasing the F1 score for key aspect augmentation from 0.82 to 0.87, while enhancing comprehension and efficiency by over 30\%. We further develop Digest Labels, a practical tool for visualizing TVDs, which human evaluations show significantly boosts usability.
