Enhancing Software Vulnerability Detection Using Code Property Graphs and Convolutional Neural Networks
Amanpreet Singh Saimbhi
TL;DR
This work tackles the challenge of automated vulnerability detection in complex software by leveraging Code Property Graphs (CPGs), which fuse ASTs, CFGs, and PDGs into a rich graph representation, with a convolutional neural network architecture adapted for graph data. The authors implement a pipeline to transform source code into CPGs, develop a graph-oriented CNN (inspired by PATCHY-SAN) for vulnerability classification, and curate a function-level labeled dataset from open-source repositories to enable scalable training and evaluation. Results show a strong performance, achieving approximately 92% accuracy and outperforming a graph-kernel SVM baseline by around 8% in F1-score, with particularly high precision for secure functions and good recall for insecure ones, though logic-based vulnerabilities remain challenging. The study demonstrates the practicality of combining rich code representations with graph DL for automated vulnerability detection and highlights avenues for further improvement, including expanded datasets, improved interpretability, and enhanced detection of complex logic flaws, paving the way for integration into CI/CD pipelines to reduce vulnerable code in production systems.
Abstract
The increasing complexity of modern software systems has led to a rise in vulnerabilities that malicious actors can exploit. Traditional methods of vulnerability detection, such as static and dynamic analysis, have limitations in scalability and automation. This paper proposes a novel approach to detecting software vulnerabilities using a combination of code property graphs and machine learning techniques. By leveraging code property graphs, which integrate abstract syntax trees, control flow graphs, and program dependency graphs, we achieve a detailed representation of software code that enhances the accuracy and granularity of vulnerability detection. We introduce various neural network models, including convolutional neural networks adapted for graph data, to process these representations. Our approach provides a scalable and automated solution for vulnerability detection, addressing the shortcomings of existing methods. We also present a newly generated dataset labeled with function-level vulnerability types sourced from open-source repositories. Our contributions include a methodology for transforming software code into code property graphs, the implementation of a convolutional neural network model for graph data, and the creation of a comprehensive dataset for training and evaluation. This work lays the foundation for more effective and efficient vulnerability detection in complex software systems.
