Table of Contents
Fetching ...

Graph2Region: Efficient Graph Similarity Learning with Structure and Scale Restoration

Zhouyang Liu, Yixin Chen, Ning Liu, Jiezhong He, Dongsheng Li

TL;DR

Graph2Region (G2R) introduces a geometric graph embedding where nodes are represented as closed regions to restore structural and scale information. By using a Multi-sink Propagation mechanism to derive relative positions and shaping node regions accordingly, G2R computes graph similarity through region overlaps (for $MCS$) and disjoint regions (for $GED$ proxy), enabling concurrent MCS and GED predictions. Empirically, G2R achieves up to 60% relative improvement in MCS accuracy across 15 datasets and demonstrates robust transferability and efficiency, with strong alignment to ground-truth rankings and competitive GED performance. The approach offers a scalable, interpretable alternative to pairwise node comparisons, with open-source code available for reproduction.

Abstract

Graph similarity is critical in graph-related tasks such as graph retrieval, where metrics like maximum common subgraph (MCS) and graph edit distance (GED) are commonly used. However, exact computations of these metrics are known to be NP-Hard. Recent neural network-based approaches approximate the similarity score in embedding spaces to alleviate the computational burden, but they either involve expensive pairwise node comparisons or fail to effectively utilize structural and scale information of graphs. To tackle these issues, we propose a novel geometric-based graph embedding method called Graph2Region (G2R). G2R represents nodes as closed regions and recovers their adjacency patterns within graphs in the embedding space. By incorporating the node features and adjacency patterns of graphs, G2R summarizes graph regions, i.e., graph embeddings, where the shape captures the underlying graph structures and the volume reflects the graph size. Consequently, the overlap between graph regions can serve as an approximation of MCS, signifying similar node regions and adjacency patterns. We further analyze the relationship between MCS and GED and propose using disjoint parts as a proxy for GED similarity. This analysis enables concurrent computation of MCS and GED, incorporating local and global structural information. Experimental evaluation highlights G2R's competitive performance in graph similarity computation. It achieves up to a 60.0\% relative accuracy improvement over state-of-the-art methods in MCS similarity learning, while maintaining efficiency in both training and inference. Moreover, G2R showcases remarkable capability in predicting both MCS and GED similarities simultaneously, providing a holistic assessment of graph similarity. Code available at https://github.com/liuzhouyang/Graph2Region.

Graph2Region: Efficient Graph Similarity Learning with Structure and Scale Restoration

TL;DR

Graph2Region (G2R) introduces a geometric graph embedding where nodes are represented as closed regions to restore structural and scale information. By using a Multi-sink Propagation mechanism to derive relative positions and shaping node regions accordingly, G2R computes graph similarity through region overlaps (for ) and disjoint regions (for proxy), enabling concurrent MCS and GED predictions. Empirically, G2R achieves up to 60% relative improvement in MCS accuracy across 15 datasets and demonstrates robust transferability and efficiency, with strong alignment to ground-truth rankings and competitive GED performance. The approach offers a scalable, interpretable alternative to pairwise node comparisons, with open-source code available for reproduction.

Abstract

Graph similarity is critical in graph-related tasks such as graph retrieval, where metrics like maximum common subgraph (MCS) and graph edit distance (GED) are commonly used. However, exact computations of these metrics are known to be NP-Hard. Recent neural network-based approaches approximate the similarity score in embedding spaces to alleviate the computational burden, but they either involve expensive pairwise node comparisons or fail to effectively utilize structural and scale information of graphs. To tackle these issues, we propose a novel geometric-based graph embedding method called Graph2Region (G2R). G2R represents nodes as closed regions and recovers their adjacency patterns within graphs in the embedding space. By incorporating the node features and adjacency patterns of graphs, G2R summarizes graph regions, i.e., graph embeddings, where the shape captures the underlying graph structures and the volume reflects the graph size. Consequently, the overlap between graph regions can serve as an approximation of MCS, signifying similar node regions and adjacency patterns. We further analyze the relationship between MCS and GED and propose using disjoint parts as a proxy for GED similarity. This analysis enables concurrent computation of MCS and GED, incorporating local and global structural information. Experimental evaluation highlights G2R's competitive performance in graph similarity computation. It achieves up to a 60.0\% relative accuracy improvement over state-of-the-art methods in MCS similarity learning, while maintaining efficiency in both training and inference. Moreover, G2R showcases remarkable capability in predicting both MCS and GED similarities simultaneously, providing a holistic assessment of graph similarity. Code available at https://github.com/liuzhouyang/Graph2Region.

Paper Structure

This paper contains 38 sections, 18 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Node regions are shifted to reflect the adjacency pattern within graphs. The graph regions of $G_1$ and $G_2$ have a substantial overlap, which signifies that they exhibit comparable node regions and similar connectivity patterns.
  • Figure 2: Overview of G2R. Given a graph as input, G2R projects each node onto the Node-To-Region Embedding Space using a GNN and calculates nodes' relative positions through Multi-sink Propagation to reflect their adjacency pattern in the embedding space. G2R then shifts the node regions from the global ordinate origin to their relative positions. Based on the shifted node regions, G2R summarizes the graph region and “re-shifts" it back to the origin. During inference, given the graph regions of two graphs as input, G2R predicts their MCS and GED similarities based on their overlapped and disjoint regions, respectively.
  • Figure 3: Each node is assigned two random numbers (Left) to establish two flow networks (Right). After $3$ steps of propagation, the concatenated sequence for the blue node is [4,5,5,3,4,5], while for the light gray one is [5,5,5,1,3,4], compared with the green one [2,3,4,5,5,5], the blue node shared a similar sequence with the light gray one.
  • Figure 4: The Encoding Phase of G2R
  • Figure 5: The Inference Phase of G2R
  • ...and 7 more figures