Table of Contents
Fetching ...

Masked AutoEncoder for Graph Clustering without Pre-defined Cluster Number k

Yuanchi Ma, Hui He, Zhongxiang Lei, Zhendong Niu

TL;DR

The paper tackles the problem of unknown cluster count $k$ in graph clustering and the limitations of parametric graph autoencoder methods. It introduces GCMA, a Graph Masked Autoencoder framework with a Masking Fusion Encoder and a Multi-Target Decoder, including a density-based CFSFDP branch, to output $k$ and clustering end-to-end. Key contributions include the first application of graph masking autoencoders to clustering and an end-to-end nonparametric approach that automatically determines $k$ while learning robust embeddings, demonstrated to outperform baselines on multiple graph datasets. This approach enhances generalization and interpretability for graph clustering and enables practical deployment without predefining the number of clusters $k$.

Abstract

Graph clustering algorithms with autoencoder structures have recently gained popularity due to their efficient performance and low training cost. However, for existing graph autoencoder clustering algorithms based on GCN or GAT, not only do they lack good generalization ability, but also the number of clusters clustered by such autoencoder models is difficult to determine automatically. To solve this problem, we propose a new framework called Graph Clustering with Masked Autoencoders (GCMA). It employs our designed fusion autoencoder based on the graph masking method for the fusion coding of graph. It introduces our improved density-based clustering algorithm as a second decoder while decoding with multi-target reconstruction. By decoding the mask embedding, our model can capture more generalized and comprehensive knowledge. The number of clusters and clustering results can be output end-to-end while improving the generalization ability. As a nonparametric class method, extensive experiments demonstrate the superiority of \textit{GCMA} over state-of-the-art baselines.

Masked AutoEncoder for Graph Clustering without Pre-defined Cluster Number k

TL;DR

The paper tackles the problem of unknown cluster count in graph clustering and the limitations of parametric graph autoencoder methods. It introduces GCMA, a Graph Masked Autoencoder framework with a Masking Fusion Encoder and a Multi-Target Decoder, including a density-based CFSFDP branch, to output and clustering end-to-end. Key contributions include the first application of graph masking autoencoders to clustering and an end-to-end nonparametric approach that automatically determines while learning robust embeddings, demonstrated to outperform baselines on multiple graph datasets. This approach enhances generalization and interpretability for graph clustering and enables practical deployment without predefining the number of clusters .

Abstract

Graph clustering algorithms with autoencoder structures have recently gained popularity due to their efficient performance and low training cost. However, for existing graph autoencoder clustering algorithms based on GCN or GAT, not only do they lack good generalization ability, but also the number of clusters clustered by such autoencoder models is difficult to determine automatically. To solve this problem, we propose a new framework called Graph Clustering with Masked Autoencoders (GCMA). It employs our designed fusion autoencoder based on the graph masking method for the fusion coding of graph. It introduces our improved density-based clustering algorithm as a second decoder while decoding with multi-target reconstruction. By decoding the mask embedding, our model can capture more generalized and comprehensive knowledge. The number of clusters and clustering results can be output end-to-end while improving the generalization ability. As a nonparametric class method, extensive experiments demonstrate the superiority of \textit{GCMA} over state-of-the-art baselines.
Paper Structure (21 sections, 8 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: The selected algorithm on the Cora dataset includes the current SOTA method based on the trend of ACC changes in the number of clusters $k$.
  • Figure 2: Flowchart of GCMA. The top half represents the overall model architecture and the bottom half represents our improved clustering algorithm process.
  • Figure 3: he variation of ACC and NMI on each data set for different values of $\alpha$
  • Figure 4: (a) and (b) shows the effect of the presence or absence of a self-optimization step on the results. (c) and (d) gives the effect of replacing the mask portion with a normal GAT layer.
  • Figure 5: 2D visualization of 3 datasets: Clustering Process and the Effect of Parameters