Structured Extraction of Vulnerabilities in OpenVAS and Tenable WAS Reports Using LLMs
Beatriz Machado, Douglas Lautert, Cristhian Kapelinski, Diego Kreutz
TL;DR
The paper addresses the challenge of heterogeneous vulnerability reports produced by OpenVAS and Tenable WAS, which impede automated analysis. It introduces Vulnerability Extractor, an LLM-based pipeline that reads reports, chunks content to preserve context, and maps diverse fields to a unified, structured schema with NULL placeholders for missing data, enabling prioritized risk management and anonymization-ready datasets. In an evaluation on a 34-vulnerability OpenVAS report, GPT-4.1 and DeepSeek achieved the highest similarity to a manually constructed baseline (ROUGE-L > 0.7) under a controlled temperature setting $T = 0.2$, while limitations stem from chunking-induced context loss and formatting challenges. Overall, the work demonstrates feasibility for structuring complex vulnerability data across scanners, supporting improved prioritization and secure data sharing in cybersecurity operations.
Abstract
This paper proposes an automated LLM-based method to extract and structure vulnerabilities from OpenVAS and Tenable WAS scanner reports, converting unstructured data into a standardized format for risk management. In an evaluation using a report with 34 vulnerabilities, GPT-4.1 and DeepSeek achieved the highest similarity to the baseline (ROUGE-L greater than 0.7). The method demonstrates feasibility in transforming complex reports into usable datasets, enabling effective prioritization and future anonymization of sensitive data.
