Masked AutoEncoder for Graph Clustering without Pre-defined Cluster Number k
Yuanchi Ma, Hui He, Zhongxiang Lei, Zhendong Niu
TL;DR
The paper tackles the problem of unknown cluster count $k$ in graph clustering and the limitations of parametric graph autoencoder methods. It introduces GCMA, a Graph Masked Autoencoder framework with a Masking Fusion Encoder and a Multi-Target Decoder, including a density-based CFSFDP branch, to output $k$ and clustering end-to-end. Key contributions include the first application of graph masking autoencoders to clustering and an end-to-end nonparametric approach that automatically determines $k$ while learning robust embeddings, demonstrated to outperform baselines on multiple graph datasets. This approach enhances generalization and interpretability for graph clustering and enables practical deployment without predefining the number of clusters $k$.
Abstract
Graph clustering algorithms with autoencoder structures have recently gained popularity due to their efficient performance and low training cost. However, for existing graph autoencoder clustering algorithms based on GCN or GAT, not only do they lack good generalization ability, but also the number of clusters clustered by such autoencoder models is difficult to determine automatically. To solve this problem, we propose a new framework called Graph Clustering with Masked Autoencoders (GCMA). It employs our designed fusion autoencoder based on the graph masking method for the fusion coding of graph. It introduces our improved density-based clustering algorithm as a second decoder while decoding with multi-target reconstruction. By decoding the mask embedding, our model can capture more generalized and comprehensive knowledge. The number of clusters and clustering results can be output end-to-end while improving the generalization ability. As a nonparametric class method, extensive experiments demonstrate the superiority of \textit{GCMA} over state-of-the-art baselines.
