Table of Contents
Fetching ...

Graph Edit Distance with General Costs Using Neural Set Divergence

Eeshaan Jain, Indradyumna Roy, Saswat Meher, Soumen Chakrabarti, Abir De

TL;DR

This work presents GED as a quadratic assignment problem (QAP) that incorporates these four costs, and proposes GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition.

Abstract

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs, in terms of the minimum-cost edit sequence that transforms one graph to the other. However, the exact computation of GED is NP-Hard, which has recently motivated the design of neural methods for GED estimation. However, they do not explicitly account for edit operations with different costs. In response, we propose GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition. We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs. Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence require aligning nodes and edges of the two graphs. We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node-pairs. Experiments on several datasets, under a variety of edit cost settings, show that GRAPHEDX consistently outperforms state-of-the-art methods and heuristics in terms of prediction error.

Graph Edit Distance with General Costs Using Neural Set Divergence

TL;DR

This work presents GED as a quadratic assignment problem (QAP) that incorporates these four costs, and proposes GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition.

Abstract

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs, in terms of the minimum-cost edit sequence that transforms one graph to the other. However, the exact computation of GED is NP-Hard, which has recently motivated the design of neural methods for GED estimation. However, they do not explicitly account for edit operations with different costs. In response, we propose GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition. We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs. Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence require aligning nodes and edges of the two graphs. We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node-pairs. Experiments on several datasets, under a variety of edit cost settings, show that GRAPHEDX consistently outperforms state-of-the-art methods and heuristics in terms of prediction error.
Paper Structure (75 sections, 2 theorems, 32 equations, 7 figures, 22 tables, 1 algorithm)

This paper contains 75 sections, 2 theorems, 32 equations, 7 figures, 22 tables, 1 algorithm.

Key Result

Proposition 1

Given a fixed set of values of $b^{\ominus},b^{\oplus},a^{\ominus},a^{\oplus}$, let $\bm{P}$ be an optimal node permutation matrix corresponding to $\mathop{\mathrm{\mathrm{GED}}}\nolimits(G,G')$, computed using Eq. eq:GED-P. Then, $\bm{P}'=\bm{P}^{\top}$ is an optimal node permutation corresponding

Figures (7)

  • Figure 1: Top: Example graphs $G$ and $G'$ are shown with color-coded nodes to indicate alignment corresponding to the optimal edit path transforming $G$ to $G'$. Bottom:GraphEdX's GED prediction pipeline. $G$ and $G'$ are independently encoded using $\mathrm{MPNN}_{\theta}$, and then padded with zero vectors to equalize sizes, resulting in contextual node representations $\bm{X}, \bm{X'} \in \mathbb{R}^{N \times d}$. For each node-pair, the corresponding embeddings and edge presence information are gathered and fed into $\mathop{\mathrm{\textsc{MLP}}}\nolimits_{\theta}$ to obtain $\bm{R}, \bm{R'} \in \mathbb{R}^{N(N-1)/2 \times D}$. Simultaneously, $\bm{X}, \bm{X'}$ are fed into $\textsc{PermNet}_{\phi}$ to obtain the soft node alignment $\bm{P}$ (Eq.\ref{['eq:P-sinkhorn-highlevel']}) which constructs the node-pair alignment matrix $\bm{S} \in \mathbb{R}^{N(N-1)/2 \times N(N-1)/2}$ as $\bm{S}[(u,v), (u',v')] = \bm{P}[u,u']\bm{P}[v,v'] + \bm{P}[u,v']\bm{P}[v,u']$. Finally, $\bm{X}, \bm{X'}, \bm{P}$ are used to approximate node insertion and deletion costs, while $\bm{R}, \bm{R'}, \bm{S}$ are used to approximate edge insertion and deletion costs. The four costs are summed to give the final prediction $\mathop{\mathrm{\mathrm{GED}}}\nolimits_{\theta,\phi}(G, G')$ (Eq.\ref{['eq:ged-theta-phi']}).
  • Figure 2: Node and edge alignment with constrained and unconstrained alignment $\bm{S}$. A dashed edge represents the deleted edge. Grey edges represent added edges.
  • Figure 3: Scatter plot comparing the distribution of the predicted GED of our model with the next best-performing model across various datasets under both uniform and non-uniform cost settings.
  • Figure 4: Error distribution of our model compared to the next best-performing model across various datasets under both uniform and non-uniform cost settings.
  • Figure 5: Performance of combinatorial optimization algorithms on various datasets under both uniform and non-uniform cost settings is evaluated. We plot MSE against the time limit allocated to the combinatorial algorithms. Additionally, we include the amortized time of our model and its MSE.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Lemma 2