Table of Contents
Fetching ...

VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation

Tao Peng, Ling Gui, Yi Sun

TL;DR

VulMCI tackles vulnerability detection from code-derived images by introducing CFG-guided pixel-row oversampling to generate more continuous, semantically rich code feature images. The method builds Code Property Graphs, splices adjacent lines along Control Flow Graph edges, and encodes code into grayscale images for CNN-based classification, while a theoretical analysis links row continuity to the Sent2Vec embedding objective. Empirically, VulMCI outperforms multiple static detectors and image-based baselines on SARD and NVD, with notable gains in TPR, TNR, and ACC, and demonstrates solid real-world applicability with competitive runtime. The work underscores the importance of image continuity in CNN-based vulnerability detection and outlines directions for lightweight tooling and extension to other graph-based detection systems.

Abstract

In recent years, the rapid development of deep learning technology has brought new prospects to the field of vulnerability detection. Many vulnerability detection methods involve converting source code into images for detection, yet they often overlook the quality of the generated images. Due to the fact that vulnerability images lack clear and continuous contours, unlike images used in object detection, Convolutional Neural Networks (CNNs) tend to lose semantic information during the convolution and pooling processes. Therefore, this paper proposes a pixel row oversampling method based on code line concatenation to generate more continuous code features, addressing the issue of discontinuity in code image coloration.Building upon these contributions, we propose the vulnerability detection system VulMCI and conduct tests on the SARD and NVD datasets. Experimental results demonstrate that VulMCI outperforms seven state-of-the-art vulnerability detectors (namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR, VulCNN, and Devign). Compared to other image-based methods, VulMCI shows improvements in various metrics, including a 2.877\% increase in True Positive Rate (TPR), a 5.446\% increase in True Negative Rate (TNR), and a 5.91\% increase in Accuracy (ACC). On the NVD real-world dataset, VulMCI achieves an average accuracy of 5.162\%, confirming its value in practical vulnerability detection applications.

VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation

TL;DR

VulMCI tackles vulnerability detection from code-derived images by introducing CFG-guided pixel-row oversampling to generate more continuous, semantically rich code feature images. The method builds Code Property Graphs, splices adjacent lines along Control Flow Graph edges, and encodes code into grayscale images for CNN-based classification, while a theoretical analysis links row continuity to the Sent2Vec embedding objective. Empirically, VulMCI outperforms multiple static detectors and image-based baselines on SARD and NVD, with notable gains in TPR, TNR, and ACC, and demonstrates solid real-world applicability with competitive runtime. The work underscores the importance of image continuity in CNN-based vulnerability detection and outlines directions for lightweight tooling and extension to other graph-based detection systems.

Abstract

In recent years, the rapid development of deep learning technology has brought new prospects to the field of vulnerability detection. Many vulnerability detection methods involve converting source code into images for detection, yet they often overlook the quality of the generated images. Due to the fact that vulnerability images lack clear and continuous contours, unlike images used in object detection, Convolutional Neural Networks (CNNs) tend to lose semantic information during the convolution and pooling processes. Therefore, this paper proposes a pixel row oversampling method based on code line concatenation to generate more continuous code features, addressing the issue of discontinuity in code image coloration.Building upon these contributions, we propose the vulnerability detection system VulMCI and conduct tests on the SARD and NVD datasets. Experimental results demonstrate that VulMCI outperforms seven state-of-the-art vulnerability detectors (namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR, VulCNN, and Devign). Compared to other image-based methods, VulMCI shows improvements in various metrics, including a 2.877\% increase in True Positive Rate (TPR), a 5.446\% increase in True Negative Rate (TNR), and a 5.91\% increase in Accuracy (ACC). On the NVD real-world dataset, VulMCI achieves an average accuracy of 5.162\%, confirming its value in practical vulnerability detection applications.
Paper Structure (15 sections, 1 theorem, 5 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 1 theorem, 5 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Our method has a good probability of producing more continuous images.

Figures (7)

  • Figure 1: A vulnerability image generated using VulCNN method
  • Figure 2: Comparison of Multiple Pooling Results
  • Figure 3: System overview of VulMCI
  • Figure 4: Function code length distribution
  • Figure 5: CNN classification of VulMCI
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof