Table of Contents
Fetching ...

Efficient Exact Subgraph Matching via GNN-based Path Dominance Embedding (Technical Report)

Yutong Ye, Xiang Lian, Mingsong Chen

TL;DR

The paper tackles exact subgraph matching in large graphs by introducing GNN-based path dominance embeddings (GNN-PE) that preserve subgraph relationships via vector dominance, enabling zero-false-dismissal pruning. It couples offline partitioned graph learning with multi-partition indexing (aR*-tree) of path embeddings and a cost-model-guided online plan, achieving exact matches efficiently. A key extension, GNN-PGE, groups path embeddings by starting vertices to reduce index size and accelerate searches while preserving correctness through path-group pruning. Extensive experiments on real and synthetic graphs show up to 1–2 orders of magnitude speedups over baselines and 100% query accuracy, highlighting practical scalability for large-scale graph data.

Abstract

The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. To further optimize our GNN-PE approach, we also propose a more efficient GNN-based path group embedding (GNN-PGE) technique, which performs subgraph matching over grouped path embedding vectors. We design effective pruning strategies (w.r.t. grouped path embeddings) that can significantly reduce the search space during the index traversal. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE and GNN-PGE approaches for exact subgraph matching on both real and synthetic graph data.

Efficient Exact Subgraph Matching via GNN-based Path Dominance Embedding (Technical Report)

TL;DR

The paper tackles exact subgraph matching in large graphs by introducing GNN-based path dominance embeddings (GNN-PE) that preserve subgraph relationships via vector dominance, enabling zero-false-dismissal pruning. It couples offline partitioned graph learning with multi-partition indexing (aR*-tree) of path embeddings and a cost-model-guided online plan, achieving exact matches efficiently. A key extension, GNN-PGE, groups path embeddings by starting vertices to reduce index size and accelerate searches while preserving correctness through path-group pruning. Extensive experiments on real and synthetic graphs show up to 1–2 orders of magnitude speedups over baselines and 100% query accuracy, highlighting practical scalability for large-scale graph data.

Abstract

The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. To further optimize our GNN-PE approach, we also propose a more efficient GNN-based path group embedding (GNN-PGE) technique, which performs subgraph matching over grouped path embedding vectors. We design effective pruning strategies (w.r.t. grouped path embeddings) that can significantly reduce the search space during the index traversal. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE and GNN-PGE approaches for exact subgraph matching on both real and synthetic graph data.
Paper Structure (34 sections, 9 theorems, 14 equations, 25 figures, 3 tables, 6 algorithms)

This paper contains 34 sections, 9 theorems, 14 equations, 25 figures, 3 tables, 6 algorithms.

Key Result

lemma 1

(A Special Case of GNN Model Parameter Settings) For a unit star subgraph $g_{v_i}$ and its star substructure $s_{v_i}$ ($\subseteq g_{v_i}$), their GNN-based node embedding vectors satisfy the dominance condition that: $o(s_{v_i})\preceq o(g_{v_i})$, if values of the weight matrix $\mathbb{W}$ (in

Figures (25)

  • Figure 1: An example of the subgraph matching in collaboration social networks.
  • Figure 2: Illustration of our GNN model.
  • Figure 3: Illustration of the input for the GNN model.
  • Figure 4: An example of node dominance embedding.
  • Figure 5: Illustration of the GNN training performance for node dominance embeddings.
  • ...and 20 more figures

Theorems & Definitions (17)

  • Example 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Example 2
  • lemma 1
  • Example 3
  • lemma 2
  • lemma 3
  • lemma 4
  • ...and 7 more