Table of Contents
Fetching ...

S$^3$GND: An Effective Learning-Based Approach for Subgraph Similarity Search Under Generalized Neighbor Difference Semantics (Technical Report)

Qi Wen, Xiang Lian, Nan Zhang, Yutong Ye, Mingsong Chen

TL;DR

This paper proposes a novel graph similarity semantic, GND, that accounts for both the keyword-set relationships between vertices and edge-weight differences and develops an efficient S$^3$GND query-processing algorithm that traverses the index, applies pruning strategies, and returns actual S$^3$GND answers.

Abstract

Subgraph similarity search over large-scale graphs is a fundamental task that retrieves subgraphs similar to a given query graph from a data graph, and it plays a crucial role in real applications such as protein discovery, social network analysis, and recommendation systems. While prior works on subgraph similarity search studied various graph similarity metrics, in this paper, we propose a novel graph similarity semantic, \textit{generalized neighbor difference} (GND), that accounts for both the keyword-set relationships between vertices and edge-weight differences. We formulate the problem of \textit{subgraph similarity search under the generalized neighbor difference semantics} (S$^3$GND), which retrieves those subgraphs similar to a query graph $q$ under GND semantics. To efficiently tackle the S$^3$GND problem, we propose an effective learning-based approach, which constructs a keyword hypergraph from the data graph, and trains a \textit{hypergraph neural network} (HGNN) model to obtain high-quality keyword embedding representations. We design effective pruning strategies, \textit{keyword embedding MBR}, \textit{vertex-Level ND lower bound}, and \textit{graph-level GND lower bound pruning}, to rule out false alarms of candidate vertices/subgraphs, and devise a tree-based indexing mechanism to facilitate efficient S$^3$GND query answering. We develop an efficient S$^3$GND query-processing algorithm that traverses the index, applies pruning strategies, and returns actual S$^3$GND answers. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our proposed S$^3$GND approach over both real and synthetic graphs.

S$^3$GND: An Effective Learning-Based Approach for Subgraph Similarity Search Under Generalized Neighbor Difference Semantics (Technical Report)

TL;DR

This paper proposes a novel graph similarity semantic, GND, that accounts for both the keyword-set relationships between vertices and edge-weight differences and develops an efficient SGND query-processing algorithm that traverses the index, applies pruning strategies, and returns actual SGND answers.

Abstract

Subgraph similarity search over large-scale graphs is a fundamental task that retrieves subgraphs similar to a given query graph from a data graph, and it plays a crucial role in real applications such as protein discovery, social network analysis, and recommendation systems. While prior works on subgraph similarity search studied various graph similarity metrics, in this paper, we propose a novel graph similarity semantic, \textit{generalized neighbor difference} (GND), that accounts for both the keyword-set relationships between vertices and edge-weight differences. We formulate the problem of \textit{subgraph similarity search under the generalized neighbor difference semantics} (SGND), which retrieves those subgraphs similar to a query graph under GND semantics. To efficiently tackle the SGND problem, we propose an effective learning-based approach, which constructs a keyword hypergraph from the data graph, and trains a \textit{hypergraph neural network} (HGNN) model to obtain high-quality keyword embedding representations. We design effective pruning strategies, \textit{keyword embedding MBR}, \textit{vertex-Level ND lower bound}, and \textit{graph-level GND lower bound pruning}, to rule out false alarms of candidate vertices/subgraphs, and devise a tree-based indexing mechanism to facilitate efficient SGND query answering. We develop an efficient SGND query-processing algorithm that traverses the index, applies pruning strategies, and returns actual SGND answers. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our proposed SGND approach over both real and synthetic graphs.
Paper Structure (25 sections, 8 theorems, 11 equations, 10 figures, 3 tables, 6 algorithms)

This paper contains 25 sections, 8 theorems, 11 equations, 10 figures, 3 tables, 6 algorithms.

Key Result

Lemma 1

(Keyword Embedding MBR Pruning) Given a candidate vertex $v_i$, its keyword embedding MBR $v_i.MBR$, and a query graph $q$, vertex $v_i$ can be safely pruned, if it holds that: $v_i.MBR \cap q_j.MBR \neq q_j.MBR$ (i.e., $q_j.MBR \not\subseteq v_i.MBR$), for all query vertices $q_j \in V(q)$.

Figures (10)

  • Figure 1: An S$^3$GND example of the skilled team search.
  • Figure 2: Workflow for processing S$^3$GND queries.
  • Figure 3: An example of a keyword hypergraph construction.
  • Figure 4: A keyword embedding model, $\Phi$, for computing high-quality keyword embeddings. (need revise: H, datasets, loss, dimensions)
  • Figure 6: Evaluation of the S$^3$GND effectiveness.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Example 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Definition 5
  • Example 2
  • ...and 5 more