Table of Contents
Fetching ...

Neural Subgraph Matching

Rex, Ying, Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, Jure Leskovec

TL;DR

NeuroMatch tackles subgraph isomorphism with a neural embedding approach that decomposes target graphs into k-hop neighborhoods and learns an order-embedding space reflecting subgraph containment. The method comprises an offline embedding stage and a fast online query stage using a subgraph prediction function and a voting scheme, trained with curriculum learning and a max-margin objective. It achieves substantial performance gains over both exact and neural baselines, including up to 100x speedups over exact methods and about 18% AUROC improvements on neural baselines, while generalizing to unseen query distributions and real-world datasets. The work demonstrates the practicality of neural subgraph matching for large-scale graphs and lays groundwork for further improvements in expressiveness and feature-aware transferability.

Abstract

Subgraph matching is the problem of determining the presence and location(s) of a given query graph in a large target graph. Despite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. However, existing techniques based on combinatorial matching and integer programming cannot handle matching problems with both large target and query graphs. Here we propose NeuroMatch, an accurate, efficient, and robust neural approach to subgraph matching. NeuroMatch decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. Trained to capture geometric constraints corresponding to subgraph relations, NeuroMatch then efficiently performs subgraph matching directly in the embedding space. Experiments demonstrate NeuroMatch is 100x faster than existing combinatorial approaches and 18% more accurate than existing approximate subgraph matching methods.

Neural Subgraph Matching

TL;DR

NeuroMatch tackles subgraph isomorphism with a neural embedding approach that decomposes target graphs into k-hop neighborhoods and learns an order-embedding space reflecting subgraph containment. The method comprises an offline embedding stage and a fast online query stage using a subgraph prediction function and a voting scheme, trained with curriculum learning and a max-margin objective. It achieves substantial performance gains over both exact and neural baselines, including up to 100x speedups over exact methods and about 18% AUROC improvements on neural baselines, while generalizing to unseen query distributions and real-world datasets. The work demonstrates the practicality of neural subgraph matching for large-scale graphs and lays groundwork for further improvements in expressiveness and feature-aware transferability.

Abstract

Subgraph matching is the problem of determining the presence and location(s) of a given query graph in a large target graph. Despite being an NP-complete problem, the subgraph matching problem is crucial in domains ranging from network science and database systems to biochemistry and cognitive science. However, existing techniques based on combinatorial matching and integer programming cannot handle matching problems with both large target and query graphs. Here we propose NeuroMatch, an accurate, efficient, and robust neural approach to subgraph matching. NeuroMatch decomposes query and target graphs into small subgraphs and embeds them using graph neural networks. Trained to capture geometric constraints corresponding to subgraph relations, NeuroMatch then efficiently performs subgraph matching directly in the embedding space. Experiments demonstrate NeuroMatch is 100x faster than existing combinatorial approaches and 18% more accurate than existing approximate subgraph matching methods.

Paper Structure

This paper contains 26 sections, 5 equations, 6 figures, 8 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overview of NeuroMatch. We decompose target graph $G_T$ by extracting $k$-hop neighborhood $G_u$ around at every node $u$. We then use a GNN to embed each $G_u$ (left). We refer to $u$ as the center node of $G_u$. We train the GNN to reflect the subgraph relationships: If $G_v$ is a subgraph of $G_u$, then node $v$ should be embedded to the lower-left of $u$. For example, since the 2-hop graph of the violet node is a subgraph of the 2-hop graph of the red node, the embedding of the violet square is to the lower-left of the red square node. At the query stage, we decompose the query $G_Q$ by picking an anchor node $q$ and embed it. From the embedding itself we can quickly determine that Query 1 is a subgraph of the neighborhood around red, blue, and green nodes in target graph because its embedding is to the lower-left of them. Similarly, Query 2 is a subgraph of the purple and red nodes and is thus positioned to the lower-left of both nodes. Notice NeuroMatch avoids expensive combinatorial matching of subgraphs.
  • Figure 2: Example sampled queries $G_Q$ at each level of the curriculum in the MSRC_21 dataset. The diameter and number of nodes increase as curriculum level advances.
  • Figure 3: TSNE visualization of order embedding for a subset of subgraphs sampled from the Enzymes dataset. As seen by examples to the right, the order constraints are well-preserved. Graphs are colored by number of edges.
  • Figure 4: Runtime analysis. Success rate of baseline heuristic matching algorithms (VF2 and RI) for matching in under 20 seconds. NeuroMatch achieves 100% success rate.
  • Figure 5: An overview of the proposed ID-GNN model. We consider node, edge and graph level tasks, and assume nodes do not have additional features. Across all examples, the task requires an embedding that allows for the differentiation of the label $A$ vs. $B$ nodes in their respective graphs. However, across all tasks, existing GNNs, regardless of depth, will always assign the same embedding to both classes of nodes, because for all tasks the computational graphs are identical. In contrast, the colored computation graphs provided by ID-GNNs allows for clear differentiation between the nodes of class $A$ and class $B$, as the colored computation graph are no longer identical across all tasks.
  • ...and 1 more figures