Advancing Vulnerability Classification with BERT: A Multi-Objective Learning Model
Himanshu Tiwari
TL;DR
The paper tackles the challenge of scalable vulnerability triage by presenting a BERT-based Vulnerability Report Classifier that jointly predicts CVE severity and multiple vulnerability types from NVD descriptions. It introduces a dual-head architecture (severity with 4 classes and 10 multi-labeled types) and a combined loss (Cross-Entropy for severity and BCEWithLogits for types) trained via a Hugging Face Trainer. Using 5,637 recent CVEs from nvdcve-1.1-recent.json (as of March 2025), the model achieves 94.30% severity accuracy and 92.10% exact-match accuracy for types, with strong per-class F1 and ROC-AUC metrics, indicating robust understanding of vulnerability language. The system is deployed through a REST API and a Streamlit UI, enabling real-time vulnerability analysis for practitioners, and the work advances open-source tooling for automated cybersecurity triage. Future directions include incorporating metadata such as CVSS vectors, expanding the type taxonomy, continual learning for emerging threats, and exploring hybrid models that integrate code semantics with text.
Abstract
The rapid increase in cybersecurity vulnerabilities necessitates automated tools for analyzing and classifying vulnerability reports. This paper presents a novel Vulnerability Report Classifier that leverages the BERT (Bidirectional Encoder Representations from Transformers) model to perform multi-label classification of Common Vulnerabilities and Exposures (CVE) reports from the National Vulnerability Database (NVD). The classifier predicts both the severity (Low, Medium, High, Critical) and vulnerability types (e.g., Buffer Overflow, XSS) from textual descriptions. We introduce a custom training pipeline using a combined loss function-Cross-Entropy for severity and Binary Cross-Entropy with Logits for types-integrated into a Hugging Face Trainer subclass. Experiments on recent NVD data demonstrate promising results, with decreasing evaluation loss across epochs. The system is deployed via a REST API and a Streamlit UI, enabling real-time vulnerability analysis. This work contributes a scalable, open-source solution for cybersecurity practitioners to automate vulnerability triage.
