Table of Contents
Fetching ...

GED-Consistent Disentanglement of Aligned and Unaligned Substructures for Graph Similarity Learning

Zhentao Zhan, Xiaoliang Xu, Jingjing Wang, Junmei Wang

TL;DR

This work addresses the mismatch between node-centric GED approximations and GED's global alignment by introducing GCGSim, a GED-consistent graph similarity framework. It relies on three novel components: GNCM for pair-aware graph representations, PSGD for principled disentanglement of aligned and unaligned substructures, and IIR for canonical alignment semantics, combined with NTN-based substructure interactions and dual prediction of edit costs and overall similarity. Theoretical justification via a variational ELBO with an informed, data-dependent prior links similarity to the posterior over substructure sources, while empirical results on four benchmarks show state-of-the-art accuracy and strong disentanglement signals. The approach delivers GED-aligned similarity with efficient inference, enabling reliable graph retrieval and analysis in practice.

Abstract

Graph Similarity Computation (GSC) is a fundamental graph related task where Graph Edit Distance (GED) serves as a prevalent metric. GED is determined by an optimal alignment between a pair of graphs that partitions each into aligned (zero-cost) and unaligned (cost-incurring) substructures. Due to NP-hard nature of exact GED computation, GED approximations based on Graph Neural Network(GNN) have emerged. Existing GNN-based GED approaches typically learn node embeddings for each graph and then aggregate pairwise node similarities to estimate the final similarity. Despite their effectiveness, we identify a mismatch between this prevalent node-centric matching paradigm and the core principles of GED. This discrepancy leads to two critical limitations: (1) a failure to capture the global structural correspondence for optimal alignment, and (2) a misattribution of edit costs driven by spurious node level signals. To address these limitations, we propose GCGSim, a GED-consistent graph similarity learning framework centering on graph-level matching and substructure-level edit costs. Specifically, we make three core technical contributions. Extensive experiments on four benchmark datasets show that GCGSim achieves state-of-the-art performance. Our comprehensive analyses further validate that the framework effectively learns disentangled and semantically meaningful substructure representations.

GED-Consistent Disentanglement of Aligned and Unaligned Substructures for Graph Similarity Learning

TL;DR

This work addresses the mismatch between node-centric GED approximations and GED's global alignment by introducing GCGSim, a GED-consistent graph similarity framework. It relies on three novel components: GNCM for pair-aware graph representations, PSGD for principled disentanglement of aligned and unaligned substructures, and IIR for canonical alignment semantics, combined with NTN-based substructure interactions and dual prediction of edit costs and overall similarity. Theoretical justification via a variational ELBO with an informed, data-dependent prior links similarity to the posterior over substructure sources, while empirical results on four benchmarks show state-of-the-art accuracy and strong disentanglement signals. The approach delivers GED-aligned similarity with efficient inference, enabling reliable graph retrieval and analysis in practice.

Abstract

Graph Similarity Computation (GSC) is a fundamental graph related task where Graph Edit Distance (GED) serves as a prevalent metric. GED is determined by an optimal alignment between a pair of graphs that partitions each into aligned (zero-cost) and unaligned (cost-incurring) substructures. Due to NP-hard nature of exact GED computation, GED approximations based on Graph Neural Network(GNN) have emerged. Existing GNN-based GED approaches typically learn node embeddings for each graph and then aggregate pairwise node similarities to estimate the final similarity. Despite their effectiveness, we identify a mismatch between this prevalent node-centric matching paradigm and the core principles of GED. This discrepancy leads to two critical limitations: (1) a failure to capture the global structural correspondence for optimal alignment, and (2) a misattribution of edit costs driven by spurious node level signals. To address these limitations, we propose GCGSim, a GED-consistent graph similarity learning framework centering on graph-level matching and substructure-level edit costs. Specifically, we make three core technical contributions. Extensive experiments on four benchmark datasets show that GCGSim achieves state-of-the-art performance. Our comprehensive analyses further validate that the framework effectively learns disentangled and semantically meaningful substructure representations.

Paper Structure

This paper contains 41 sections, 3 theorems, 30 equations, 6 figures, 5 tables.

Key Result

Lemma 1

The log-marginal likelihood admits the following ELBO, where for brevity, $q_\phi(k) \triangleq q_\phi(k | G_i, G_j, s_{ij})$ and $p_\theta(k) \triangleq p_\theta(k | G_i, G_j)$, $q_\phi(k | G_i, G_j, s_{ij})$ is a variational distribution, parameterized by $\phi$, that approximates the true posterior $p_\theta(k | G_i, G_j, s_{ij})$.

Figures (6)

  • Figure 1: (a) A graph edit path for an input graph pair $(G_i, G_j)$. The graph is partitioned into aligned substructures ($as_i, as_j$) and unaligned substructures ($us_i, us_j$). GED is derived from the cost of transforming $us_i$ to $us_j$. The number within each node denotes its ID, and the color indicates its label. (b) Comparison of the conventional node-centric framework with our proposed GED-consistent framework (GCGSim).
  • Figure 2: The architecture of GCGSim
  • Figure 3: Analysis of representation swapping. The bars show the performance degradation (MSE difference) under three swapping settings: (a) Intra-Instance Swap (IIS), (b) Extra-Instance Swap for Aligned substructures (EISA), and (c) Extra-Instance Swap for Unaligned substructures (EISU). The x-axis represents different model variants, from a baseline ('None') to our full model ('All'), to show the contribution of each component to robustness. A smaller bar indicates greater robustness.
  • Figure 4: Visualization of the pair-aware similarity learned by the GNCM module across GNN layers. The color intensity indicates the similarity of a node's evolving local context to the other graph's global structure (blue: low, red: high).
  • Figure 5: Sensitivity analysis on the ECP weight $\lambda$ (a) and the IIR parameter $\beta$ (b).
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: Graph Edit Distance
  • Definition 2: Optimal Alignment
  • Definition 3: Aligned and Unaligned substructure
  • Lemma 1: The Evidence Lower Bound Objective
  • proof : Proof
  • Lemma 2: Monotonicity of the Ideal Posterior
  • proof
  • Theorem 1: Optimizing the ELBO via an Informed Prior Design
  • proof