Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
TL;DR
This work tackles the loss of node-level detail in cluster-based graph pooling by introducing Node-to-Cluster Attention (N2C-Attn), which fuses node- and cluster-level information via Multiple Kernel Learning. By avoiding full coarsening and employing kernelized softmax for linear-time complexity, Cluster-GT uses clusters as tokens and enables cluster-wise interaction through N2C-Attn, with two MKL variants: tensor-product and convex-sum kernels. The model combines a node-wise GNN, a simple Metis partitioner, and the N2C-Attn module, achieving strong performance across eight graph-level datasets and revealing domain-dependent shifts in kernel emphasis. These results demonstrate a scalable, rich representation for hierarchical graphs and point to new directions in interaction strategies between clusters and nodes. Overall, N2C-Attn provides a principled, efficient bridge between cluster- and node-level representations for graph learning.
Abstract
In the realm of graph learning, there is a category of methods that conceptualize graphs as hierarchical structures, utilizing node clustering to capture broader structural information. While generally effective, these methods often rely on a fixed graph coarsening routine, leading to overly homogeneous cluster representations and loss of node-level information. In this paper, we envision the graph as a network of interconnected node sets without compressing each cluster into a single embedding. To enable effective information transfer among these node sets, we propose the Node-to-Cluster Attention (N2C-Attn) mechanism. N2C-Attn incorporates techniques from Multiple Kernel Learning into the kernelized attention framework, effectively capturing information at both node and cluster levels. We then devise an efficient form for N2C-Attn using the cluster-wise message-passing framework, achieving linear time complexity. We further analyze how N2C-Attn combines bi-level feature maps of queries and keys, demonstrating its capability to merge dual-granularity information. The resulting architecture, Cluster-wise Graph Transformer (Cluster-GT), which uses node clusters as tokens and employs our proposed N2C-Attn module, shows superior performance on various graph-level tasks. Code is available at https://github.com/LUMIA-Group/Cluster-wise-Graph-Transformer.
