Graph Edit Distance with General Costs Using Neural Set Divergence

Eeshaan Jain; Indradyumna Roy; Saswat Meher; Soumen Chakrabarti; Abir De

Graph Edit Distance with General Costs Using Neural Set Divergence

Eeshaan Jain, Indradyumna Roy, Saswat Meher, Soumen Chakrabarti, Abir De

TL;DR

This work presents GED as a quadratic assignment problem (QAP) that incorporates these four costs, and proposes GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition.

Abstract

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs, in terms of the minimum-cost edit sequence that transforms one graph to the other. However, the exact computation of GED is NP-Hard, which has recently motivated the design of neural methods for GED estimation. However, they do not explicitly account for edit operations with different costs. In response, we propose GRAPHEDX, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion and node addition. We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs. Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence require aligning nodes and edges of the two graphs. We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node-pairs. Experiments on several datasets, under a variety of edit cost settings, show that GRAPHEDX consistently outperforms state-of-the-art methods and heuristics in terms of prediction error.

Graph Edit Distance with General Costs Using Neural Set Divergence

TL;DR

Abstract

Paper Structure (75 sections, 2 theorems, 32 equations, 7 figures, 22 tables, 1 algorithm)

This paper contains 75 sections, 2 theorems, 32 equations, 7 figures, 22 tables, 1 algorithm.

Introduction
Present work
Neural set divergence surrogates for GED
Learning all node-pair representations
Node-edge consistent alignment
Related work
Heuristics for Graph Edit Distance
Optimal Transport
Neural graph similarity computation
Problem setup
Notation
Graph edit distance with general cost
Problem statement
Proposed approach
GED computation using node alignment map
...and 60 more sections

Key Result

Proposition 1

Given a fixed set of values of $b^{\ominus},b^{\oplus},a^{\ominus},a^{\oplus}$, let $\bm{P}$ be an optimal node permutation matrix corresponding to $\mathop{\mathrm{\mathrm{GED}}}\nolimits(G,G')$, computed using Eq. eq:GED-P. Then, $\bm{P}'=\bm{P}^{\top}$ is an optimal node permutation corresponding

Figures (7)

Figure 1: Top: Example graphs $G$ and $G'$ are shown with color-coded nodes to indicate alignment corresponding to the optimal edit path transforming $G$ to $G'$. Bottom:GraphEdX's GED prediction pipeline. $G$ and $G'$ are independently encoded using $\mathrm{MPNN}_{\theta}$, and then padded with zero vectors to equalize sizes, resulting in contextual node representations $\bm{X}, \bm{X'} \in \mathbb{R}^{N \times d}$. For each node-pair, the corresponding embeddings and edge presence information are gathered and fed into $\mathop{\mathrm{\textsc{MLP}}}\nolimits_{\theta}$ to obtain $\bm{R}, \bm{R'} \in \mathbb{R}^{N(N-1)/2 \times D}$. Simultaneously, $\bm{X}, \bm{X'}$ are fed into $\textsc{PermNet}_{\phi}$ to obtain the soft node alignment $\bm{P}$ (Eq.\ref{['eq:P-sinkhorn-highlevel']}) which constructs the node-pair alignment matrix $\bm{S} \in \mathbb{R}^{N(N-1)/2 \times N(N-1)/2}$ as $\bm{S}[(u,v), (u',v')] = \bm{P}[u,u']\bm{P}[v,v'] + \bm{P}[u,v']\bm{P}[v,u']$. Finally, $\bm{X}, \bm{X'}, \bm{P}$ are used to approximate node insertion and deletion costs, while $\bm{R}, \bm{R'}, \bm{S}$ are used to approximate edge insertion and deletion costs. The four costs are summed to give the final prediction $\mathop{\mathrm{\mathrm{GED}}}\nolimits_{\theta,\phi}(G, G')$ (Eq.\ref{['eq:ged-theta-phi']}).
Figure 2: Node and edge alignment with constrained and unconstrained alignment $\bm{S}$. A dashed edge represents the deleted edge. Grey edges represent added edges.
Figure 3: Scatter plot comparing the distribution of the predicted GED of our model with the next best-performing model across various datasets under both uniform and non-uniform cost settings.
Figure 4: Error distribution of our model compared to the next best-performing model across various datasets under both uniform and non-uniform cost settings.
Figure 5: Performance of combinatorial optimization algorithms on various datasets under both uniform and non-uniform cost settings is evaluated. We plot MSE against the time limit allocated to the combinatorial algorithms. Additionally, we include the amortized time of our model and its MSE.
...and 2 more figures

Theorems & Definitions (2)

Proposition 1
Lemma 2

Graph Edit Distance with General Costs Using Neural Set Divergence

TL;DR

Abstract

Graph Edit Distance with General Costs Using Neural Set Divergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)