Table of Contents
Fetching ...

Efficient Graph Similarity Computation with Alignment Regularization

Wei Zhuo, Guang Tan

TL;DR

It is shown that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which is called the Alignment Regularization (AReg).

Abstract

We consider the graph similarity computation (GSC) task based on graph edit distance (GED) estimation. State-of-the-art methods treat GSC as a learning-based prediction task using Graph Neural Networks (GNNs). To capture fine-grained interactions between pair-wise graphs, these methods mostly contain a node-level matching module in the end-to-end learning pipeline, which causes high computational costs in both the training and inference stages. We show that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which we call the Alignment Regularization (AReg). In the training stage, the AReg term imposes a node-graph correspondence constraint on the GNN encoder. In the inference stage, the graph-level representations learned by the GNN encoder are directly used to compute the similarity score without using AReg again to speed up inference. We further propose a multi-scale GED discriminator to enhance the expressive ability of the learned representations. Extensive experiments on real-world datasets demonstrate the effectiveness, efficiency and transferability of our approach.

Efficient Graph Similarity Computation with Alignment Regularization

TL;DR

It is shown that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which is called the Alignment Regularization (AReg).

Abstract

We consider the graph similarity computation (GSC) task based on graph edit distance (GED) estimation. State-of-the-art methods treat GSC as a learning-based prediction task using Graph Neural Networks (GNNs). To capture fine-grained interactions between pair-wise graphs, these methods mostly contain a node-level matching module in the end-to-end learning pipeline, which causes high computational costs in both the training and inference stages. We show that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which we call the Alignment Regularization (AReg). In the training stage, the AReg term imposes a node-graph correspondence constraint on the GNN encoder. In the inference stage, the graph-level representations learned by the GNN encoder are directly used to compute the similarity score without using AReg again to speed up inference. We further propose a multi-scale GED discriminator to enhance the expressive ability of the learned representations. Extensive experiments on real-world datasets demonstrate the effectiveness, efficiency and transferability of our approach.
Paper Structure (31 sections, 11 equations, 9 figures, 5 tables)

This paper contains 31 sections, 11 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of separating the matching model from the end-to-end GSC framework to achieve a fast model (right side). In the fast model, the dotted arrow means the matching model does not participate in the similarity computation in the inference stage.
  • Figure 2: The optimal edit path with 3 edit operations to transform $G_i$ to $G_j$. As a result, $\mathrm{GED}(G_i, G_j) = 3$.
  • Figure 3: Overview of ERIC. The GNN encoder is shared across two graphs. The green lines denote AReg. The summarized graph representations $\widehat{\boldsymbol{Z}}_i$ and $\widehat{\boldsymbol{Z}}_j$ are combinations of graph representations learned in each layer, which are fed into the GED discriminator followed by a regression function to obtain the prediction value. In the inference stage, the AReg submodule is removed.
  • Figure 4: t-SNE van2008visualizing visualization of the IMDB dataset. Each point is a graph encoded by the GNN encoder of ERIC. The green cross means a randomly sampled query graph; red points mean the top 50% of graph datasets that are most similar to the query graph based on ground-truth GEDs; blue points mean the remaining 50% graphs in the dataset. (a): Using NTN as the discriminator, many similar graphs do not cluster together around the query graph even if each cluster is tight. (b): Using $\ell_2$ distance as the discriminator, different clusters are separated clearly but the query graph is close to the cluster boundary; in addition, each cluster is dispersive. (c): By adaptively combining NTN and $\ell_2$ distance, our model makes similar graphs closely located around the query, while dissimilar graphs are far away from the query.
  • Figure 5: Impact of different order $p$ on AIDS700 and LINUX datasets.
  • ...and 4 more figures