Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Sicong Cao; Xiaobing Sun; Xiaoxue Wu; David Lo; Lili Bo; Bin Li; Wei Liu

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu

TL;DR

Coca addresses the dual challenge of robustness and explainability in GNN-based vulnerability detection by introducing Coca$_{Tra}$, a combinatorial-contrastive training method that yields robustness against spurious correlations, and Coca$_{Exp}$, a dual-view causal explainer that delivers concise and effective explanations. Through extensive experiments on a large, multi-source dataset, Coca improves detection performance across multiple detectors and outperforms state-of-the-art explainers in generating vulnerability explanations. The results demonstrate that robustness enables more faithful explanations and that dual-view causality effectively balances coverage and brevity, enhancing practical security insights and actionability.

Abstract

Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue. In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to avoid spurious explanations; and 2) provide both concise and effective explanations to reason about the detected vulnerabilities. \sysname consists of two core parts referred to as Trainer and Explainer. The former aims to train a detection model which is robust to random perturbation based on combinatorial contrastive learning, while the latter builds an explainer to derive crucial code statements that are most decisive to the detected vulnerability via dual-view causal inference as explanations. We apply Coca over three typical GNN-based vulnerability detectors. Experimental results show that Coca can effectively mitigate the spurious correlation issue, and provide more useful high-quality explanations.

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

TL;DR

Coca addresses the dual challenge of robustness and explainability in GNN-based vulnerability detection by introducing Coca

, a combinatorial-contrastive training method that yields robustness against spurious correlations, and Coca

, a dual-view causal explainer that delivers concise and effective explanations. Through extensive experiments on a large, multi-source dataset, Coca improves detection performance across multiple detectors and outperforms state-of-the-art explainers in generating vulnerability explanations. The results demonstrate that robustness enables more faithful explanations and that dual-view causality effectively balances coverage and brevity, enhancing practical security insights and actionability.

Abstract

Paper Structure (33 sections, 8 equations, 6 figures, 4 tables)

This paper contains 33 sections, 8 equations, 6 figures, 4 tables.

Introduction
Background
Problem Formulation
Contrastive Learning for Code
Explanation for GNN-based Models
Motivation
Special Concerns for DL-based Security Applications
Why Not Fine-Grained Detectors?
Why Not Existing Explainers?
Key Insights Behind Our Design
Robustness Enhancement
Data Augmentation
Combinatorial Contrastive Learning
Explainable Detection
Vulnerability Detection
...and 18 more sections

Figures (6)

Figure 1: Contrastive code representation learning pipeline.
Figure 2: The workflow of Coca.
Figure 3: The architecture of Coca$_{Tra}$.
Figure 4: Visualizations of feature representations learned by DeepWuKong trained with/without Coca$_{Tra}$.
Figure 5: Qualitative study of our Coca vs. baselines.
...and 1 more figures

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

TL;DR

Abstract

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)