Table of Contents
Fetching ...

MAGNET: A Multi-Graph Attentional Network for Code Clone Detection

Zixian Zhang, Takfarinas Saber

TL;DR

MAGNET addresses the challenge of robust code clone detection by integrating AST, CFG, and DFG representations through a three-stage attentional graph network. It employs residual GCNs with node-level self-attention for intra-graph learning, a selective gated cross-attention mechanism for fine-grained inter-graph interactions, and Set2Set pooling to fuse modalities into unified program-level embeddings, with similarity measured by cosine distance. Empirical results on BigCloneBench and Google Code Jam show state-of-the-art performance, with ablations confirming the critical roles of multi-graph fusion and each attention component. While the approach increases computational overhead, it substantially enhances semantic clone detection and provides a scalable path for more holistic code understanding and clone analysis.

Abstract

Code clone detection is a fundamental task in software engineering that underpins refactoring, debugging, plagiarism detection, and vulnerability analysis. Existing methods often rely on singular representations such as abstract syntax trees (ASTs), control flow graphs (CFGs), and data flow graphs (DFGs), which capture only partial aspects of code semantics. Hybrid approaches have emerged, but their fusion strategies are typically handcrafted and ineffective. In this study, we propose MAGNET, a multi-graph attentional framework that jointly leverages AST, CFG, and DFG representations to capture syntactic and semantic features of source code. MAGNET integrates residual graph neural networks with node-level self-attention to learn both local and long-range dependencies, introduces a gated cross-attention mechanism for fine-grained inter-graph interactions, and employs Set2Set pooling to fuse multi-graph embeddings into unified program-level representations. Extensive experiments on BigCloneBench and Google Code Jam demonstrate that MAGNET achieves state-of-the-art performance with an overall F1 score of 96.5\% and 99.2\% on the two datasets, respectively. Ablation studies confirm the critical contributions of multi-graph fusion and each attentional component. Our code is available at https://github.com/ZixianReid/Multigraph_match

MAGNET: A Multi-Graph Attentional Network for Code Clone Detection

TL;DR

MAGNET addresses the challenge of robust code clone detection by integrating AST, CFG, and DFG representations through a three-stage attentional graph network. It employs residual GCNs with node-level self-attention for intra-graph learning, a selective gated cross-attention mechanism for fine-grained inter-graph interactions, and Set2Set pooling to fuse modalities into unified program-level embeddings, with similarity measured by cosine distance. Empirical results on BigCloneBench and Google Code Jam show state-of-the-art performance, with ablations confirming the critical roles of multi-graph fusion and each attention component. While the approach increases computational overhead, it substantially enhances semantic clone detection and provides a scalable path for more holistic code understanding and clone analysis.

Abstract

Code clone detection is a fundamental task in software engineering that underpins refactoring, debugging, plagiarism detection, and vulnerability analysis. Existing methods often rely on singular representations such as abstract syntax trees (ASTs), control flow graphs (CFGs), and data flow graphs (DFGs), which capture only partial aspects of code semantics. Hybrid approaches have emerged, but their fusion strategies are typically handcrafted and ineffective. In this study, we propose MAGNET, a multi-graph attentional framework that jointly leverages AST, CFG, and DFG representations to capture syntactic and semantic features of source code. MAGNET integrates residual graph neural networks with node-level self-attention to learn both local and long-range dependencies, introduces a gated cross-attention mechanism for fine-grained inter-graph interactions, and employs Set2Set pooling to fuse multi-graph embeddings into unified program-level representations. Extensive experiments on BigCloneBench and Google Code Jam demonstrate that MAGNET achieves state-of-the-art performance with an overall F1 score of 96.5\% and 99.2\% on the two datasets, respectively. Ablation studies confirm the critical contributions of multi-graph fusion and each attentional component. Our code is available at https://github.com/ZixianReid/Multigraph_match

Paper Structure

This paper contains 30 sections, 14 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Methodology Architecture Employed in Our Study.
  • Figure 2: Overview of MAGNET Network Structure.