ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models
Shivansh Chopra, Hussain Ahmad, Diksha Goel, Claudia Szabo
TL;DR
ChatNVD presents an LLM-driven vulnerability assessment tool that leverages NVD data to generate contextual, accessible vulnerability analyses. The study compares GPT-4o Mini, LLaMA 3, and Gemini 1.5 Pro using a TF-IDF embedding pipeline and an evaluation on CVE-based queries, finding GPT-4o Mini achieves the highest accuracy with strong robustness across input sizes and question types ($>0.92$). The work demonstrates practical deployment through a FastAPI–AWS–React stack and provides a framework for domain-specific evaluation and prompt design in cybersecurity tasks. It also discusses reliability, cost, and scalability considerations essential for real-world adoption and future research directions in vulnerability assessment using LLMs.
Abstract
The increasing frequency and sophistication of cybersecurity vulnerabilities in software systems underscores the need for more robust and effective vulnerability assessment methods. However, existing approaches often rely on highly technical and abstract frameworks, which hinder understanding and increase the likelihood of exploitation, resulting in severe cyberattacks. In this paper, we introduce ChatNVD, a support tool powered by Large Language Models (LLMs) that leverages the National Vulnerability Database (NVD) to generate accessible, context-rich summaries of software vulnerabilities. We develop three variants of ChatNVD, utilizing three prominent LLMs: GPT-4o Mini by OpenAI, LLaMA 3 by Meta, and Gemini 1.5 Pro by Google. To evaluate their performance, we conduct a comparative evaluation focused on their ability to identify, interpret, and explain software vulnerabilities. Our results demonstrate that GPT-4o Mini outperforms the other models, achieving over 92% accuracy and the lowest error rates, making it the most reliable option for real-world vulnerability assessment.
