Table of Contents
Fetching ...

Scalable and Adaptive Spectral Embedding for Attributed Graph Clustering

Yunhui Liu, Tieke He, Qing Wu, Tao Zheng, Jianhua Zhao

TL;DR

This work tackles the scalability bottleneck of attributed graph clustering, where traditional spectral clustering is hindered by quadratic memory and time requirements. It introduces Scalable and Adaptive Spectral Embedding (SASE), which combines $k$-order SGC-based feature smoothing, linear fusion with original features, Random Fourier Features for implicit kernel-based spectral embedding, and an adaptive order selection mechanism. The approach achieves linear time and space complexity, requires no trainable parameters, and delivers superior clustering accuracy and speed on large graphs (notably a $6.9\%$ ACC improvement and $5.87\times$ speedup on ArXiv over the strong S3GC baseline). These results demonstrate SASE’s practical impact for scalable clustering in real-world attributed graphs and its potential to replace heavier neural-network-based methods in large-scale settings.

Abstract

Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clustering method devoid of parameter learning. SASE comprises three main components: node features smoothing via $k$-order simple graph convolution, scalable spectral clustering using random Fourier features, and adaptive order selection. With these designs, SASE not only effectively captures global cluster structures but also exhibits linear time and space complexity relative to the graph size. Empirical results demonstrate the superiority of SASE. For example, on the ArXiv dataset with 169K nodes and 1.17M edges, SASE achieves a 6.9\% improvement in ACC and a $5.87\times$ speedup compared to the runner-up, S3GC.

Scalable and Adaptive Spectral Embedding for Attributed Graph Clustering

TL;DR

This work tackles the scalability bottleneck of attributed graph clustering, where traditional spectral clustering is hindered by quadratic memory and time requirements. It introduces Scalable and Adaptive Spectral Embedding (SASE), which combines -order SGC-based feature smoothing, linear fusion with original features, Random Fourier Features for implicit kernel-based spectral embedding, and an adaptive order selection mechanism. The approach achieves linear time and space complexity, requires no trainable parameters, and delivers superior clustering accuracy and speed on large graphs (notably a ACC improvement and speedup on ArXiv over the strong S3GC baseline). These results demonstrate SASE’s practical impact for scalable clustering in real-world attributed graphs and its potential to replace heavier neural-network-based methods in large-scale settings.

Abstract

Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clustering method devoid of parameter learning. SASE comprises three main components: node features smoothing via -order simple graph convolution, scalable spectral clustering using random Fourier features, and adaptive order selection. With these designs, SASE not only effectively captures global cluster structures but also exhibits linear time and space complexity relative to the graph size. Empirical results demonstrate the superiority of SASE. For example, on the ArXiv dataset with 169K nodes and 1.17M edges, SASE achieves a 6.9\% improvement in ACC and a speedup compared to the runner-up, S3GC.
Paper Structure (21 sections, 5 equations, 1 figure, 3 tables)

This paper contains 21 sections, 5 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Impact of $\alpha$ and $k$ on CiteSeer and PubMed.