Table of Contents
Fetching ...

Bridging Semantics & Structure for Software Vulnerability Detection using Hybrid Network Models

Jugal Gajjar, Kaustik Ranaware, Kamalasankari Subramaniakuppusamy

TL;DR

The paper tackles software vulnerability detection by bridging program structure and semantics. It introduces a hybrid framework that fuses CFG-based graph embeddings with lightweight local LLM embeddings, using a two-way gating fusion, InfoNCE contrastive alignment, and Graph Laplacian regularization, along with interpretability via saliency and natural-language explanations. On Java vulnerability detection tasks, the approach achieves 93.57% accuracy, outperforming Graph Attention Network embeddings and pretrained LLM baselines, while providing actionable explanations. The method emphasizes privacy-preserving, scalable deployment on modest hardware, shifting vulnerability analysis toward deeper structural and semantic insights with practical impact for secure software development.

Abstract

Software vulnerabilities remain a persistent risk, yet static and dynamic analyses often overlook structural dependencies that shape insecure behaviors. Viewing programs as heterogeneous graphs, we capture control- and data-flow relations as complex interaction networks. Our hybrid framework combines these graph representations with light-weight (<4B) local LLMs, uniting topological features with semantic reasoning while avoiding the cost and privacy concerns of large cloud models. Evaluated on Java vulnerability detection (binary classification), our method achieves 93.57% accuracy-an 8.36% gain over Graph Attention Network-based embeddings and 17.81% over pretrained LLM baselines such as Qwen2.5 Coder 3B. Beyond accuracy, the approach extracts salient subgraphs and generates natural language explanations, improving interpretability for developers. These results pave the way for scalable, explainable, and locally deployable tools that can shift vulnerability analysis from purely syntactic checks to deeper structural and semantic insights, facilitating broader adoption in real-world secure software development.

Bridging Semantics & Structure for Software Vulnerability Detection using Hybrid Network Models

TL;DR

The paper tackles software vulnerability detection by bridging program structure and semantics. It introduces a hybrid framework that fuses CFG-based graph embeddings with lightweight local LLM embeddings, using a two-way gating fusion, InfoNCE contrastive alignment, and Graph Laplacian regularization, along with interpretability via saliency and natural-language explanations. On Java vulnerability detection tasks, the approach achieves 93.57% accuracy, outperforming Graph Attention Network embeddings and pretrained LLM baselines, while providing actionable explanations. The method emphasizes privacy-preserving, scalable deployment on modest hardware, shifting vulnerability analysis toward deeper structural and semantic insights with practical impact for secure software development.

Abstract

Software vulnerabilities remain a persistent risk, yet static and dynamic analyses often overlook structural dependencies that shape insecure behaviors. Viewing programs as heterogeneous graphs, we capture control- and data-flow relations as complex interaction networks. Our hybrid framework combines these graph representations with light-weight (<4B) local LLMs, uniting topological features with semantic reasoning while avoiding the cost and privacy concerns of large cloud models. Evaluated on Java vulnerability detection (binary classification), our method achieves 93.57% accuracy-an 8.36% gain over Graph Attention Network-based embeddings and 17.81% over pretrained LLM baselines such as Qwen2.5 Coder 3B. Beyond accuracy, the approach extracts salient subgraphs and generates natural language explanations, improving interpretability for developers. These results pave the way for scalable, explainable, and locally deployable tools that can shift vulnerability analysis from purely syntactic checks to deeper structural and semantic insights, facilitating broader adoption in real-world secure software development.

Paper Structure

This paper contains 30 sections, 9 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Multimodal fusion pipeline for code classification with explainability. The pipeline processes graph embeddings and code text through normalization, fusion, classification, and explanation generation stages.
  • Figure 2: t-SNE visualization of graph embeddings for four architectures with GraphCodeBERT semantics.
  • Figure 3: Top-left: Two-way gating weight distribution. Top-right: Gradient saliency across projection dimensions. Bottom: Example LLM-generated justifications.