Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems
Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu
TL;DR
Coca addresses the dual challenge of robustness and explainability in GNN-based vulnerability detection by introducing Coca$_{Tra}$, a combinatorial-contrastive training method that yields robustness against spurious correlations, and Coca$_{Exp}$, a dual-view causal explainer that delivers concise and effective explanations. Through extensive experiments on a large, multi-source dataset, Coca improves detection performance across multiple detectors and outperforms state-of-the-art explainers in generating vulnerability explanations. The results demonstrate that robustness enables more faithful explanations and that dual-view causality effectively balances coverage and brevity, enhancing practical security insights and actionability.
Abstract
Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue. In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations.
