LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights
Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, Jeff Huang
TL;DR
This survey addresses the problem of detecting software vulnerabilities with Large Language Models (LLMs), comparing them to static and dynamic analysis and outlining a framework to assess architectures, datasets, metrics, and techniques. It analyzes over 50 recent studies to map encoder-, decoder-, and encoder-decoder-based LLMs (with a current tilt toward decoder-only models like GPT-4) used for function/file-level vulnerability detection in C/C++, Java, and Solidity, and highlights the scarcity of repository-level datasets. The paper details common benchmarks (e.g., BigVul, CVEfixes, Devign, Juliet) and evaluation metrics (Accuracy, Precision, Recall, F1, MCC; BLEU/ROUGE for generation) and categorizes techniques into code preprocessing, prompt engineering, and fine-tuning, with PEFT and full fine-tuning achieving strong performance on large models. It also identifies key challenges—dataset quality, cross-file/semi-structured vulnerability contexts, model reliability and explainability, and real-world deployment—and offers concrete directions such as repository-wide evaluation, vulnerability reproduction/repair, specialized vulnerability detection, and robust, scalable dataset construction to advance practical applicability.
Abstract
Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection, addressing critical challenges in the security domain. Traditional methods, such as static and dynamic analysis, often falter due to inefficiencies, high false positive rates, and the growing complexity of modern software systems. By leveraging their ability to analyze code structures, identify patterns, and generate repair suggestions, LLMs, exemplified by models like GPT, BERT, and CodeBERT, present a novel and scalable approach to mitigating vulnerabilities. This paper provides a detailed survey of LLMs in vulnerability detection. It examines key aspects, including model architectures, application methods, target languages, fine-tuning strategies, datasets, and evaluation metrics. We also analyze the scope of current research problems, highlighting the strengths and weaknesses of existing approaches. Further, we address challenges such as cross-language vulnerability detection, multimodal data integration, and repository-level analysis. Based on these findings, we propose solutions for issues like dataset scalability, model interpretability, and applications in low-resource scenarios. Our contributions are threefold: (1) a systematic review of how LLMs are applied in vulnerability detection; (2) an analysis of shared patterns and differences across studies, with a unified framework for understanding the field; and (3) a summary of key challenges and future research directions. This work provides valuable insights for advancing LLM-based vulnerability detection. We also maintain and regularly update latest selected paper on https://github.com/OwenSanzas/LLM-For-Vulnerability-Detection
