Deep Hierarchical Graph Alignment Kernels
Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye
TL;DR
This paper tackles the bottleneck of conventional R-convolution graph kernels, which ignore topological position and implicit similarities among substructures. It introduces Deep Hierarchical Graph Alignment Kernels (DHGAK), which encode b-width $h$-hop slices around nodes, embed slices with natural language models, cluster the embeddings to enable aligned substructures, and aggregate via kernel mean embedding across hierarchical slices. The authors provide theoretical guarantees: the Deep Alignment Kernel (DAK), Deep Graph Alignment Kernel (DGAK), and DHGAK are positive semi-definite, alignment is transitive, and there exist clustering-derived feature maps that yield linear separability in RKHS; they also offer complexity bounds. Empirically, DHGAK (with BERT or Word2Vec embeddings) outperforms state-of-the-art graph kernels on 14 of 16 benchmark datasets, with favorable running times on large graphs. The work demonstrates a principled, scalable approach to incorporating substructure relations and topology into graph similarity, with practical impact for molecular classification and related graph learning tasks.
Abstract
Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK).
