Table of Contents
Fetching ...

Computing Approximate Graph Edit Distance via Optimal Transport

Qihao Cheng, Da Yan, Tianhao Wu, Zhongyi Huang, Qin Zhang

TL;DR

This work tackles the NP-hard problem of graph edit distance by introducing OT-based mechanisms that leverage global graph context. It presents GEDIOT, a learning-based inverse OT framework with a learnable Sinkhorn module, GEDGW, an unsupervised OT/GW approach, and GEDHOT, an ensemble that combines both. The methods yield substantial improvements over existing GED approaches in both GED accuracy and edit-path generation, while maintaining good generalizability to unseen graphs. The work demonstrates the practicality of combining OT with graph embeddings to achieve scalable and accurate GED estimation and GEP synthesis across diverse datasets.

Abstract

Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical approximate algorithms, but they directly fit the coupling relationship between a pair of vertices from their vertex features. We argue that while pairwise vertex features can capture the coupling cost (discrepancy) of a pair of vertices, the vertex coupling matrix should be derived from the vertex-pair cost matrix through a more well-established method that is aware of the global context of the graph pair, such as optimal transport. In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. Our learning method, GEDIOT, is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix. Our unsupervised method, GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations, respectively, which can be solved efficiently without needing the ground truth. Our ensemble method, GEDHOT, combines GEDIOT and GEDGW to further boost the performance. Extensive experiments demonstrate that our methods significantly outperform the existing methods in terms of the performance of GED computation, edit path generation, and model generalizability.

Computing Approximate Graph Edit Distance via Optimal Transport

TL;DR

This work tackles the NP-hard problem of graph edit distance by introducing OT-based mechanisms that leverage global graph context. It presents GEDIOT, a learning-based inverse OT framework with a learnable Sinkhorn module, GEDGW, an unsupervised OT/GW approach, and GEDHOT, an ensemble that combines both. The methods yield substantial improvements over existing GED approaches in both GED accuracy and edit-path generation, while maintaining good generalizability to unseen graphs. The work demonstrates the practicality of combining OT with graph embeddings to achieve scalable and accurate GED estimation and GEP synthesis across diverse datasets.

Abstract

Given a graph pair , graph edit distance (GED) is defined as the minimum number of edit operations converting to . GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical approximate algorithms, but they directly fit the coupling relationship between a pair of vertices from their vertex features. We argue that while pairwise vertex features can capture the coupling cost (discrepancy) of a pair of vertices, the vertex coupling matrix should be derived from the vertex-pair cost matrix through a more well-established method that is aware of the global context of the graph pair, such as optimal transport. In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. Our learning method, GEDIOT, is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix. Our unsupervised method, GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations, respectively, which can be solved efficiently without needing the ground truth. Our ensemble method, GEDHOT, combines GEDIOT and GEDGW to further boost the performance. Extensive experiments demonstrate that our methods significantly outperform the existing methods in terms of the performance of GED computation, edit path generation, and model generalizability.

Paper Structure

This paper contains 48 sections, 3 theorems, 41 equations, 21 figures, 6 tables, 4 algorithms.

Key Result

theorem 1

There exists a cost matrix $\widehat{\mathbf{C}}^*$, such that the optimal coupling matrix $\widehat{\bm{\pi}}^*$ of the optimization problem is exactly the ground truth node matching $\bm{\pi}^*$.

Figures (21)

  • Figure 1: A toy example of graph pair $(G^1,G^2)$
  • Figure 2: OT Motivation and Learning-based Model Comparison
  • Figure 3: Example of Cost Matrix and Coupling Matrices
  • Figure 4: The architecture of GEDIOT
  • Figure 5: Illustration of the Dummy Supernode
  • ...and 16 more figures

Theorems & Definitions (3)

  • theorem 1
  • theorem 2
  • theorem 3