Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective
Yujie Mo, Zhihe Lu, Runpeng Yu, Xiaofeng Zhu, Xinchao Wang
TL;DR
The paper tackles noise in self-supervised heterogeneous graph learning and underutilization of cluster-level information by reframing SHGL through spectral clustering. It introduces SCHOOL, a framework that employs rank-constrained spectral clustering to produce an affinity matrix with exactly $c$ connected components and uses a projection-based, eigendecomposition–free route to obtain a cluster assignment matrix that aligns with spectral structure via a spectral loss. It further enhances representations with node- and cluster-level dual consistency constraints, yielding a final objective $\mathcal{J} = \mathcal{L}_{sp} + \mu \mathcal{L}_{nc} + \delta \mathcal{L}_{cc}$ and concatenated node representations for downstream tasks. Theoretical results show partitions into $c$ classes with improved generalization, and experiments on both heterogeneous and homogeneous graphs demonstrate consistent improvements over state-of-the-art SHGL methods in node classification and clustering. Overall, the work provides a principled, scalable bridge between SHGL and spectral clustering with practical gains for diverse graph learning tasks.
Abstract
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios. However, while existing SHGL methods share a similar essential with clustering approaches, they encounter two significant limitations: (i) noise in graph structures is often introduced during the message-passing process to weaken node representations, and (ii) cluster-level information may be inadequately captured and leveraged, diminishing the performance in downstream tasks. In this paper, we address these limitations by theoretically revisiting SHGL from the spectral clustering perspective and introducing a novel framework enhanced by rank and dual consistency constraints. Specifically, our framework incorporates a rank-constrained spectral clustering method that refines the affinity matrix to exclude noise effectively. Additionally, we integrate node-level and cluster-level consistency constraints that concurrently capture invariant and clustering information to facilitate learning in downstream tasks. We theoretically demonstrate that the learned representations are divided into distinct partitions based on the number of classes and exhibit enhanced generalization ability across tasks. Experimental results affirm the superiority of our method, showcasing remarkable improvements in several downstream tasks compared to existing methods.
