Table of Contents
Fetching ...

Keep It Simple: Towards Accurate Vulnerability Detection for Large Code Graphs

Xin Peng, Shangwen Wang, Yihao Qin, Bo Lin, Liqian Chen, Xiaoguang Mao

TL;DR

A novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning, which achieves an improvement in accuracy compared to the state-of-the-art method, AMPLE.

Abstract

Software vulnerability detection is crucial for high-quality software development. Recently, some studies utilizing Graph Neural Networks (GNNs) to learn the graph representation of code in vulnerability detection tasks have achieved remarkable success. However, existing graph-based approaches mainly face two limitations that prevent them from generalizing well to large code graphs: (1) the interference of noise information in the code graph; (2) the difficulty in capturing long-distance dependencies within the graph. To mitigate these problems, we propose a novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning. The former hierarchically filters redundant information in the code graph, thereby reducing the size of the graph, while the latter collaboratively employs the Graph Transformer and GNN to learn code graph representations from both the global and local perspectives, thus capturing long-distance dependencies. Extensive experiments demonstrate promising results on three widely used benchmark datasets: our method significantly outperforms several other baselines in terms of the accuracy and F1 score. Particularly, in large code graphs, ANGLE achieves an improvement in accuracy of 34.27%-161.93% compared to the state-of-the-art method, AMPLE. Such results demonstrate the effectiveness of ANGLE in vulnerability detection tasks.

Keep It Simple: Towards Accurate Vulnerability Detection for Large Code Graphs

TL;DR

A novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning, which achieves an improvement in accuracy compared to the state-of-the-art method, AMPLE.

Abstract

Software vulnerability detection is crucial for high-quality software development. Recently, some studies utilizing Graph Neural Networks (GNNs) to learn the graph representation of code in vulnerability detection tasks have achieved remarkable success. However, existing graph-based approaches mainly face two limitations that prevent them from generalizing well to large code graphs: (1) the interference of noise information in the code graph; (2) the difficulty in capturing long-distance dependencies within the graph. To mitigate these problems, we propose a novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning. The former hierarchically filters redundant information in the code graph, thereby reducing the size of the graph, while the latter collaboratively employs the Graph Transformer and GNN to learn code graph representations from both the global and local perspectives, thus capturing long-distance dependencies. Extensive experiments demonstrate promising results on three widely used benchmark datasets: our method significantly outperforms several other baselines in terms of the accuracy and F1 score. Particularly, in large code graphs, ANGLE achieves an improvement in accuracy of 34.27%-161.93% compared to the state-of-the-art method, AMPLE. Such results demonstrate the effectiveness of ANGLE in vulnerability detection tasks.

Paper Structure

This paper contains 46 sections, 19 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Statistics on the distribution of graph sizes in the Devign Nips2019Devign, Reveal TSE2021Reveal and Big-Vul MSR2020BigVul datasets.
  • Figure 2: Accuracy with different number of nodes in the Devign Nips2019Devign, Reveal TSE2021Reveal and Big-Vul MSR2020BigVul datasets.
  • Figure 3: Statistics of longest distance in the Devign Nips2019Devign, Reveal TSE2021Reveal and Big-Vul MSR2020BigVul datasets.
  • Figure 4: Two different functions from the busybox project have the same type of vulnerability (CWE-125). The red-colored code is the vulnerable code, and the green-colored code is the repaired code.
  • Figure 5: An example of a vulnerability in the Linux kernel.
  • ...and 8 more figures