Table of Contents
Fetching ...

Deep Learning Aided Software Vulnerability Detection: A Survey

Md Nizam Uddin, Yihe Zhang, Xiali Hei

TL;DR

The paper addresses the challenge of detecting software vulnerabilities with deep learning by introducing the Vulnerability Detection Lifecycle, a six-phase framework that unifies dataset construction, granularity definition, code representation, model design, evaluation, and real-world deployment. It surveys 34 DL-based vulnerability studies from 2017–2024, comparing methodologies across code representations (AST, graph, hybrid, NLP, embedding) and model types (sequence, graph, hybrid) while emphasizing evaluation standards and real-world applicability. Key contributions include a comprehensive taxonomy of representation techniques, a benchmark for comparing representations, and critical insights into dataset quality, labeling, and cross-language generalization. The survey highlights practical implications for building robust, scalable vulnerability detectors and identifies gaps in baselines, standard metrics, interpretability, and deployment feedback, offering guidelines to steer future research and industry adoption.

Abstract

The pervasive nature of software vulnerabilities has emerged as a primary factor for the surge in cyberattacks. Traditional vulnerability detection methods, including rule-based, signature-based, manual review, static, and dynamic analysis, often exhibit limitations when encountering increasingly complex systems and a fast-evolving attack landscape. Deep learning (DL) methods excel at automatically learning and identifying complex patterns in code, enabling more effective detection of emerging vulnerabilities. This survey analyzes 34 relevant studies from high-impact journals and conferences between 2017 and 2024. This survey introduces the conceptual framework Vulnerability Detection Lifecycle for the first time to systematically analyze and compare various DL-based vulnerability detection methods and unify them into the same analysis perspective. The framework includes six phases: (1) Dataset Construction, (2) Vulnerability Granularity Definition, (3) Code Representation, (4) Model Design, (5) Model Performance Evaluation, and (6) Real-world Project Implementation. For each phase of the framework, we identify and explore key issues through in-depth analysis of existing research while also highlighting challenges that remain inadequately addressed. This survey provides guidelines for future software vulnerability detection, facilitating further implementation of deep learning techniques applications in this field.

Deep Learning Aided Software Vulnerability Detection: A Survey

TL;DR

The paper addresses the challenge of detecting software vulnerabilities with deep learning by introducing the Vulnerability Detection Lifecycle, a six-phase framework that unifies dataset construction, granularity definition, code representation, model design, evaluation, and real-world deployment. It surveys 34 DL-based vulnerability studies from 2017–2024, comparing methodologies across code representations (AST, graph, hybrid, NLP, embedding) and model types (sequence, graph, hybrid) while emphasizing evaluation standards and real-world applicability. Key contributions include a comprehensive taxonomy of representation techniques, a benchmark for comparing representations, and critical insights into dataset quality, labeling, and cross-language generalization. The survey highlights practical implications for building robust, scalable vulnerability detectors and identifies gaps in baselines, standard metrics, interpretability, and deployment feedback, offering guidelines to steer future research and industry adoption.

Abstract

The pervasive nature of software vulnerabilities has emerged as a primary factor for the surge in cyberattacks. Traditional vulnerability detection methods, including rule-based, signature-based, manual review, static, and dynamic analysis, often exhibit limitations when encountering increasingly complex systems and a fast-evolving attack landscape. Deep learning (DL) methods excel at automatically learning and identifying complex patterns in code, enabling more effective detection of emerging vulnerabilities. This survey analyzes 34 relevant studies from high-impact journals and conferences between 2017 and 2024. This survey introduces the conceptual framework Vulnerability Detection Lifecycle for the first time to systematically analyze and compare various DL-based vulnerability detection methods and unify them into the same analysis perspective. The framework includes six phases: (1) Dataset Construction, (2) Vulnerability Granularity Definition, (3) Code Representation, (4) Model Design, (5) Model Performance Evaluation, and (6) Real-world Project Implementation. For each phase of the framework, we identify and explore key issues through in-depth analysis of existing research while also highlighting challenges that remain inadequately addressed. This survey provides guidelines for future software vulnerability detection, facilitating further implementation of deep learning techniques applications in this field.

Paper Structure

This paper contains 21 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Lifecycle of deep learning-based source code vulnerability detection.
  • Figure 2: Data imbalance across studied datasets.
  • Figure 3: Detection granularity with inherited issues.
  • Figure 4: Comparison of code representation techniques.
  • Figure 5: Visualizing Performance Metrics of Vulnerability Detection Models