Table of Contents
Fetching ...

HeGMN: Heterogeneous Graph Matching Network for Learning Graph Similarity

Shilong Sang, Ke-Jia Chen, Zheng liu

TL;DR

The paper tackles graph similarity learning for heterogeneous graphs, where semantic types critically affect matching quality. It proposes HeGMN, an end-to-end HGSL framework with a heterogeneous graph isomorphism encoder (HGIN) and two type-aware matching branches (graph-level and node-level) to predict HGED-based similarity. A formal HGED definition, along with HGSL datasets and thorough ablations, underpins the evaluation, showing that type-aware encoding and matching substantially improve performance, especially on larger, more complex graphs. Overall, HeGMN demonstrates superior accuracy and efficiency for heterogeneous graph similarity tasks, signaling strong potential for real-world applications involving rich semantic graphs.

Abstract

Graph similarity learning (GSL), also referred to as graph matching in many scenarios, is a fundamental problem in computer vision, pattern recognition, and graph learning. However, previous GSL methods assume that graphs are homogeneous and struggle to maintain their performance on heterogeneous graphs. To address this problem, this paper proposes a Heterogeneous Graph Matching Network (HeGMN), which is an end-to-end graph similarity learning framework composed of a two-tier matching mechanism. Firstly, a heterogeneous graph isomorphism network is proposed as the encoder, which reinvents graph isomorphism network for heterogeneous graphs by perceiving different semantic relationships during aggregation. Secondly, a graph-level and node-level matching modules are designed, both employing type-aligned matching principles. The former conducts graph-level matching by node type alignment, and the latter computes the interactions between the cross-graph nodes with the same type thus reducing noise interference and computational overhead. Finally, the graph-level and node-level matching features are combined and fed into fully connected layers for predicting graph similarity scores. In experiments, we propose a heterogeneous graph resampling method to construct heterogeneous graph pairs and define the corresponding heterogeneous graph edit distance, filling the gap in missing datasets. Extensive experiments demonstrate that HeGMN consistently achieves advanced performance on graph similarity prediction across all datasets.

HeGMN: Heterogeneous Graph Matching Network for Learning Graph Similarity

TL;DR

The paper tackles graph similarity learning for heterogeneous graphs, where semantic types critically affect matching quality. It proposes HeGMN, an end-to-end HGSL framework with a heterogeneous graph isomorphism encoder (HGIN) and two type-aware matching branches (graph-level and node-level) to predict HGED-based similarity. A formal HGED definition, along with HGSL datasets and thorough ablations, underpins the evaluation, showing that type-aware encoding and matching substantially improve performance, especially on larger, more complex graphs. Overall, HeGMN demonstrates superior accuracy and efficiency for heterogeneous graph similarity tasks, signaling strong potential for real-world applications involving rich semantic graphs.

Abstract

Graph similarity learning (GSL), also referred to as graph matching in many scenarios, is a fundamental problem in computer vision, pattern recognition, and graph learning. However, previous GSL methods assume that graphs are homogeneous and struggle to maintain their performance on heterogeneous graphs. To address this problem, this paper proposes a Heterogeneous Graph Matching Network (HeGMN), which is an end-to-end graph similarity learning framework composed of a two-tier matching mechanism. Firstly, a heterogeneous graph isomorphism network is proposed as the encoder, which reinvents graph isomorphism network for heterogeneous graphs by perceiving different semantic relationships during aggregation. Secondly, a graph-level and node-level matching modules are designed, both employing type-aligned matching principles. The former conducts graph-level matching by node type alignment, and the latter computes the interactions between the cross-graph nodes with the same type thus reducing noise interference and computational overhead. Finally, the graph-level and node-level matching features are combined and fed into fully connected layers for predicting graph similarity scores. In experiments, we propose a heterogeneous graph resampling method to construct heterogeneous graph pairs and define the corresponding heterogeneous graph edit distance, filling the gap in missing datasets. Extensive experiments demonstrate that HeGMN consistently achieves advanced performance on graph similarity prediction across all datasets.

Paper Structure

This paper contains 26 sections, 2 theorems, 16 equations, 7 figures, 6 tables.

Key Result

Theorem 1

Let $\operatorname{g_\theta}$ be a GNN with a sufficient number of layers. If the following two conditions are met, $\operatorname{g_\theta}$ can map any graphs $\boldsymbol{G_i}$ and $\boldsymbol{G_j}$ to different embeddings when they are determined to be non-isomorphic by Weisfeiler-Lehman test:

Figures (7)

  • Figure 1: Molecular structures of ethylene (left), a two-carbon molecule with one double bond and four hydrogen atoms ($C_2H_4$), and difluoroethylene (right), a derivative of ethylene where two hydrogen atoms are replaced by fluorine atoms ($C_2H_2F_2$).
  • Figure 2: Comparison of several GSL baselines on a homogeneous graph dataset (AIDS700nef) and a heterogeneous graph dataset (ACM1000).
  • Figure 3: A toy example of HGED calculation, where different colors represent different types. The edit cost from the leftmost graph to the rightmost graph is 5.
  • Figure 4: Overall framework of HeGMN. The framework contains four modules: heterogeneous node encoding, type-aligned graph matching, type-aligned node matching, and graph similarity prediction. The first module learns the node representation by siamese HGIN. The second module calculates the similarity of type embedding by MLP. The third module matches the cross-graph nodes with the same type. The fourth module combines the output vectors of the second and third modules and feeds them into FCL to predict the similarity. Nodes and edges in different colors represent different types in the heterogeneous graph.
  • Figure 5: (a) The MSE results when the number of HGIN layers is $[1, 2, 3, 4, 5]$. (b) The MSE results when the hyperparameter $\mathcal{B}$ in matrix decomposition is $[1, 2, 3, 4, 5]$.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof