Optimization-Free Graph Embedding via Distributional Kernel for Community Detection
Shuaibin Song, Kai Ming Ting, Kaifeng Zhang, Tianrun Liang
TL;DR
This work tackles over-smoothing in NAS-based graph embeddings for community detection by identifying two distributional factors—node-vector density and node-degree distribution—that traditional methods ignore. It introduces an optimization-free Weighted Distributional Kernel (WDK) and its multi-level extension (mWDK) that jointly model these distributions using Isolation Kernel to preserve node distinguishability and cluster structure. Empirical results show that mWDK yields equal-density, well-separated embeddings, outperforming a range of deep learning methods on 14 real and synthetic datasets, including a large-scale graph, while maintaining scalability. The findings argue for distribution-aware, optimization-free embeddings as a robust approach to unsupervised graph clustering and spectral clustering.
Abstract
Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods. However, NAS-based methods are identified to be prone to over-smoothing-the loss of node distinguishability with increased iterations-thereby limiting their effectiveness. This paper identifies two characteristics in a network, i.e., the distributions of nodes and node degrees that are critical for expressive representation but have been overlooked in existing methods. We show that these overlooked characteristics contribute significantly to over-smoothing of NAS-methods. To address this, we propose a novel weighted distribution-aware kernel that embeds nodes while taking their distributional characteristics into consideration. Our method has three distinguishing features: (1) it is the first method to explicitly incorporate both distributional characteristics; (2) it requires no optimization; and (3) it effectively mitigates the adverse effects of over-smoothing, allowing WL to preserve node distinguishability and expressiveness even after many iterations of embedding. Experiments demonstrate that our method achieves superior community detection performance via spectral clustering, outperforming existing graph embedding methods, including deep learning methods, on standard benchmarks.
