Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Kaiyang Li; Shihao Ji; Zhipeng Cai; Wei Li

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Kaiyang Li, Shihao Ji, Zhipeng Cai, Wei Li

Abstract

Approximate subgraph matching (ASM) is a task that determines the approximate presence of a given query graph in a large target graph. Being an NP-hard problem, ASM is critical in graph analysis with a myriad of applications ranging from database systems and network science to biochemistry and privacy. Existing techniques often employ heuristic search strategies, which cannot fully utilize the graph information, leading to sub-optimal solutions. This paper proposes a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) algorithm that exploits graph transformers to effectively extract graph representations and RL-based policies for ASM. Our model is built upon the branch-and-bound algorithm that selects one pair of nodes from the two input graphs at a time for potential matches. Instead of using heuristics, we exploit a Graph Transformer architecture to extract feature representations that encode the full graph information. To enhance the training of the RL policy, we use supervised signals to guide our agent in an imitation learning stage. Subsequently, the policy is fine-tuned with the Proximal Policy Optimization (PPO) that optimizes the accumulative long-term rewards over episodes. Extensive experiments on both synthetic and real-world datasets demonstrate that our RL-ASM outperforms existing methods in terms of effectiveness and efficiency. Our source code is available at https://github.com/KaiyangLi1992/RL-ASM.

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Abstract

Paper Structure (23 sections, 11 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 11 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Preliminary
Problem Definition
The Search Algorithm for ASM
Proposed method
Overview
The RL-ASM Framework
Node Features
Encoder
Decoder
Policy Training
Pre-training
Why are LapPE and Graph Transformer Needed for ASM?
Experiments
...and 8 more sections

Figures (6)

Figure 1: An example of approximate subgraph matching, where $\{(u_1, v_1), (u_2, v_2),$$(u_3, v_4), (u_4, v_3)\}$ achieves the best approximate matching from query graph $G^q$ to target graph $G^t$ with the smallest graph edit distance of 1.
Figure 2: An illustration of the search process of ASM on $(G^q, G^t)$. The branch-and-bound search algorithm (Algorithm \ref{['alg:1']}) produces a tree structure, where each node represents a state ($s_t$), the node ID reflects the order in which the state is visited, and each direct edge represents an action ($a_t$), which adds a node-pair to the current node-node mapping $M_t$. The search is essentially depth-first with pruning through the lower bound check. The policy (Line 12 of Algorithm \ref{['alg:1']}) refers to a node-pair selection strategy, i.e., which state to visit next? The visiting order affects the performance of searching in the tree. For example, if state 6 can be visited before state 1, a better solution can be found in fewer iterations. This means the GED of the current best match $currentMiniDist$ will be smaller, allowing the algorithm to prune more branches in subsequent search steps. Hence, the search efficiency will be higher. When the search completes or a pre-defined search iteration budget is exhausted, the best solution identified by then will be returned. For the clarity of visualization, some nodes and edges in the search tree are omitted.
Figure 3: Overview of RL-ASM. RL-ASM consists of two major components: encoder and decoder. The encoder processes node label, mapping info, positional and/or structural encodings by alternating intra- and inter-components $L$ times (with $L$ layers) to extract powerful node representations. The decoder leverages self-attention to node embeddings of $G^q$ to generate a global state representation $\mathbf{x_{s_t}}$. The action representations are then generated by the product of embeddings of node $u_{next}\in G^q$ (according to $\phi$), learnable weight tensors $\mathbf{W_3}$, and embeddings of unmapped candidate nodes $v_1,v_2,v_3$, and $v_4$ from $G^t$. Subsequently, the representations of state and actions are concatenated, which is fed to a MLP and a softmax classifier to calculate the probability distribution over the actions $P_{\theta}(a_t|s_t)$.
Figure 4: Example graphs that are indistinguishable by MPNN.
Figure 5: The probabilities that ISM and our method find the optimal solutions within 600s.
...and 1 more figures

Theorems & Definitions (1)

Definition 1

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Abstract

Approximate Subgraph Matching with Neural Graph Representations and Reinforcement Learning

Authors

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)