Table of Contents
Fetching ...

Modularity aided consistent attributed graph clustering via coarsening

Samarth Bhatia, Yukti Makhija, Manoj Kumar, Sandeep Kumar

TL;DR

This work proposes a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes, demonstrating its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.

Abstract

Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.

Modularity aided consistent attributed graph clustering via coarsening

TL;DR

This work proposes a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes, demonstrating its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.

Abstract

Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.
Paper Structure (32 sections, 3 theorems, 63 equations, 10 figures, 3 tables, 3 algorithms)

This paper contains 32 sections, 3 theorems, 63 equations, 10 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

The sequence $\{C^{t+1}, X_C^{t+1}\}$ generated by Algorithm alg:q-magc converges to the set of Karush–Kuhn–Tucker (KKT) optimality points for Problem optimization_objective

Figures (10)

  • Figure 1: a)Architecture of Q-GCN. We want to train the encoder to learn the soft cluster assignment matrix $C$. The coarsened features $X_C$ are obtained using the relation $X_C^{t+1} = {C^{t+1}}^\dagger X$. Finally, our proposed MAGC loss is then computed using $C$ and $X_C$. b)Architecture of Q-VGAE/Q-GMM-VGAE. The three-layer GCN encoder takes $X$ and $A$ as inputs to learn the latent representation $Z$ of the graph. $Z$ is then passed through an inner-product decoder to reconstruct the adjacency matrix $\hat{A}$. The reconstruction loss is calculated between $\hat{A}$ and $A$, and the KL-divergence is applied to $Z$. In Q-VGAE (or Q-GMM-VGAE), $Z$ is also passed through a GCN layer (or GMM) to output the soft cluster assignments $C$. The MAGC loss is then computed in the same manner as in Q-GCN.
  • Figure 2: Evolution of the latent space of a) Q-VGAE and b) Q-GMM-VGAE over time for Cora. Colors represent cluster assignments.
  • Figure 3: Plots of evolution of latent space for Q-VGAE and Q-GMM-VGAE methods for CiteSeer, PubMed, Brazil (Air Traffic) and Europe (Air Traffic) datasets.
  • Figure 4: Visualization of the generated adjacency and feature covariance matrices for the ADC-SBM
  • Figure 5: Evolution of the different loss terms throughout training, denoted by their weight parameters. Also the term X_tT theta_C X_t term is the smoothness term $tr(X_C^T C^T \Theta C X_C)$
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Definition 1
  • Lemma 1
  • Theorem 2
  • proof