Deep Hierarchical Graph Alignment Kernels

Shuhao Tang; Hao Tian; Xiaofeng Cao; Wei Ye

Deep Hierarchical Graph Alignment Kernels

Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

TL;DR

This paper tackles the bottleneck of conventional R-convolution graph kernels, which ignore topological position and implicit similarities among substructures. It introduces Deep Hierarchical Graph Alignment Kernels (DHGAK), which encode b-width $h$-hop slices around nodes, embed slices with natural language models, cluster the embeddings to enable aligned substructures, and aggregate via kernel mean embedding across hierarchical slices. The authors provide theoretical guarantees: the Deep Alignment Kernel (DAK), Deep Graph Alignment Kernel (DGAK), and DHGAK are positive semi-definite, alignment is transitive, and there exist clustering-derived feature maps that yield linear separability in RKHS; they also offer complexity bounds. Empirically, DHGAK (with BERT or Word2Vec embeddings) outperforms state-of-the-art graph kernels on 14 of 16 benchmark datasets, with favorable running times on large graphs. The work demonstrates a principled, scalable approach to incorporating substructure relations and topology into graph similarity, with practical impact for molecular classification and related graph learning tasks.

Abstract

Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK).

Deep Hierarchical Graph Alignment Kernels

TL;DR

-hop slices around nodes, embed slices with natural language models, cluster the embeddings to enable aligned substructures, and aggregate via kernel mean embedding across hierarchical slices. The authors provide theoretical guarantees: the Deep Alignment Kernel (DAK), Deep Graph Alignment Kernel (DGAK), and DHGAK are positive semi-definite, alignment is transitive, and there exist clustering-derived feature maps that yield linear separability in RKHS; they also offer complexity bounds. Empirically, DHGAK (with BERT or Word2Vec embeddings) outperforms state-of-the-art graph kernels on 14 of 16 benchmark datasets, with favorable running times on large graphs. The work demonstrates a principled, scalable approach to incorporating substructure relations and topology into graph similarity, with practical impact for molecular classification and related graph learning tasks.

Abstract

Paper Structure (29 sections, 4 theorems, 20 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 29 sections, 4 theorems, 20 equations, 6 figures, 6 tables, 2 algorithms.

Introduction
Related Work
The Model
Notation
Hierarchical Neighborhood Structure Encoding
Deep Hierarchical Graph Alignment Kernels
Theoretical Analysis of DHGAK
Experimental Evaluation
Experimental Setup
Classification Results
Classification Accuracy
Parameter Sensitivity
Ablation Study
Running Time
Conclusion
...and 14 more sections

Key Result

Theorem 1

DAK is positive semi-definite.

Figures (6)

Figure 1: The structure 2D depictions of Aspirin, Acetaminophen, and 3-Acetyloxybenzoic acid.
Figure 2: The framework of DHGAK is presented with slice width $b = 1$, maximum hop $H = 1$, clustering method set $\Psi = \{\psi_0\}$, and experiment times $T = 3$. For convenience, we illustrate our method from the perspective of space transformation. First, we construct the hierarchical $b$-width $h$-hop slice for each node in $G_1$ and $G_2$, where $b = 1$ and $h$ is taken from 0 to 1. Next, the encoding method mentioned in Section \ref{['sec:SliceEncoding']} is used to obtain the encoding sequence of slice $S_h^b(v)$. The numbers in nodes represent the node labels in the original graphs, we can get $S_0^1(v_1) = [1,3,2]$, $S_1^1(v_1) = [3, 1, 4, 2, 1, 1]$, $S_0^1(u_1)=[1,2,4,3]$, and $S_1^1(u_1) = [2, 1, 3, 4, 1, 1, 3, 1]$. Then, the encoding of the slice is embedded into deep embedding space via a Natural Language Model and updated by Equation \ref{['eq:node_emb']}. Within this deep embedding space, we cluster similar slices and concatenate the cluster indicators for all clustering methods $\Psi$ and experiment times $T$ to obtain the feature maps of DAK. $\mathcal{C}_{\psi_0, i}^{(t)}$ represents the $i$-th cluster at the $t$-th experiment under clustering method $\psi_0\in \Psi$. Finally, the feature map of DGAK is the kernel mean embeddings of the DAK feature maps. The feature map of DHGAK is computed as the sum of those of DGAK on slices of different hierarchies.
Figure 3: (a) shows $b$-width $h$-hop slice of node $v_1$, where $b$ is fixed as 0 and $h$ ranges from 0 to 2. (b) shows 1-width 1-hop slice of node $v_1$, where $N_1^1(v_1)$ is the union set of $N^1(v_2), N^1(v_3)$, and $N^1(v_4)$ since $v_2,v_3,v_4\in N_1(v_1)$.
Figure 4: Paramter sensitivity analysis on $H$ and $cluster\_factor$ in DHGAK-BERT. We present the classification accuracy for combinations of $H\in\{1,3,\ldots,9\}$ and $cluster\_factor\in\{0.1, 0.14,\ldots,2\}$ selected from 0.1 to 2 in base 10 log scale.
Figure 5: Parameter sensitivity analysis on $b$ and $\alpha$ in DHGAK-BERT. We present the classification accuracy for combinations of $b\in\{0,1,2,3\}$ and $\alpha\in\{0,0.2,0.4,\ldots,1.0\}$.
...and 1 more figures

Theorems & Definitions (12)

Definition 1: $b$-width $h$-hop Slice of Node
Definition 2: Deep Slice Embedding
Definition 3: Deep Alignment Kernel (DAK)
Theorem 1
Definition 4: Feature Map of DAK
Theorem 2
Theorem 3: Linear Separation Property of DHGAK
proof
proof
Lemma 1
...and 2 more

Deep Hierarchical Graph Alignment Kernels

TL;DR

Abstract

Deep Hierarchical Graph Alignment Kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)