Table of Contents
Fetching ...

Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN

Wei Huang, Hanchen Wang, Dong Wen, Shaozhen Ma, Wenjie Zhang, Xuemin Lin

TL;DR

This work tackles efficiently computing Graph Edit Distance (GED) without ground-truth supervision by introducing GEDRanker, a GAN-based framework that couples a diffusion-based node-matching solver with a preference-aware discriminator. The discriminator guides exploration by ranking candidate node-matchings with respect to GED quality, enabled by differentiable Gumbel-Sinkhorn decoding and a Bayes Personalized Ranking objective. Empirical results show GEDRanker achieves near-optimal GED quality on benchmark datasets, outperforming many supervised baselines and traditional solvers, and demonstrating strong generalization and scalability to larger graphs. The approach offers a practical, label-free path to high-quality GED estimation, with potential applicability to real-world graph similarity tasks where ground-truth node matchings are unavailable.

Abstract

Graph Edit Distance (GED) is a fundamental graph similarity metric widely used in various applications. However, computing GED is an NP-hard problem. Recent state-of-the-art hybrid GED solver has shown promising performance by formulating GED as a bipartite graph matching problem, then leveraging a generative diffusion model to predict node matching between two graphs, from which both the GED and its corresponding edit path can be extracted using a traditional algorithm. However, such methods typically rely heavily on ground-truth supervision, where the ground-truth node matchings are often costly to obtain in real-world scenarios. In this paper, we propose GEDRanker, a novel unsupervised GAN-based framework for GED computation. Specifically, GEDRanker consists of a matching-based GED solver and introduces an interpretable preference-aware discriminator. By leveraging preference signals over different node matchings derived from edit path lengths, the discriminator can guide the matching-based solver toward generating high-quality node matching without the need for ground-truth supervision. Extensive experiments on benchmark datasets demonstrate that our GEDRanker enables the matching-based GED solver to achieve near-optimal solution quality without any ground-truth supervision.

Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN

TL;DR

This work tackles efficiently computing Graph Edit Distance (GED) without ground-truth supervision by introducing GEDRanker, a GAN-based framework that couples a diffusion-based node-matching solver with a preference-aware discriminator. The discriminator guides exploration by ranking candidate node-matchings with respect to GED quality, enabled by differentiable Gumbel-Sinkhorn decoding and a Bayes Personalized Ranking objective. Empirical results show GEDRanker achieves near-optimal GED quality on benchmark datasets, outperforming many supervised baselines and traditional solvers, and demonstrating strong generalization and scalability to larger graphs. The approach offers a practical, label-free path to high-quality GED estimation, with potential applicability to real-world graph similarity tasks where ground-truth node matchings are unavailable.

Abstract

Graph Edit Distance (GED) is a fundamental graph similarity metric widely used in various applications. However, computing GED is an NP-hard problem. Recent state-of-the-art hybrid GED solver has shown promising performance by formulating GED as a bipartite graph matching problem, then leveraging a generative diffusion model to predict node matching between two graphs, from which both the GED and its corresponding edit path can be extracted using a traditional algorithm. However, such methods typically rely heavily on ground-truth supervision, where the ground-truth node matchings are often costly to obtain in real-world scenarios. In this paper, we propose GEDRanker, a novel unsupervised GAN-based framework for GED computation. Specifically, GEDRanker consists of a matching-based GED solver and introduces an interpretable preference-aware discriminator. By leveraging preference signals over different node matchings derived from edit path lengths, the discriminator can guide the matching-based solver toward generating high-quality node matching without the need for ground-truth supervision. Extensive experiments on benchmark datasets demonstrate that our GEDRanker enables the matching-based GED solver to achieve near-optimal solution quality without any ground-truth supervision.

Paper Structure

This paper contains 29 sections, 16 equations, 7 figures, 9 tables, 3 algorithms.

Figures (7)

  • Figure 1: (a) An optimal edit path for converting $G_1$ to $G_2$ with GED$(G_1,G_2)=4$. (b) An optimal node matching matrix from which an optimal edit path can be derived.
  • Figure 2: An overview of GEDRanker. For each training step, given a pair of training graphs, we maintain a record of the current best node matching matrix $\bar{\pi}$ and the node matching matrix obtained from the previous training step $\pi_{last}$. A noisy matching $\pi^t$ is sampled at a random diffusion time step $t$ and denoised by the denoising network $g_\phi$ to produce node matching scores. The resulting matching probability matrix $\hat{\pi}_{g_\phi}$ is obtained via Gumbel-Sinkhorn and greedily decoded to $\pi_{g_\phi}$. The preference-aware discriminator $D_\theta$ is trained to learn a preference ordering over $\bar{\pi}$, $\pi_{\text{last}}$, and $\hat{\pi}_{g_\phi}$. Next, $g_\phi$ is trained to recover $\bar{\pi}$ and maximize the preference score $D_\theta(G_1, G_2, \hat{\pi}_{g_\phi})$ . Finally, the record is updated by $\pi_{g_\phi}$.
  • Figure 3: Average edit path length of the best found node matching matrices on training graph pairs.
  • Figure 4: Reverse process of diffusion-based node matching model during inference.
  • Figure 5: Comparison of $D_\theta$'s network architecture.
  • ...and 2 more figures