Table of Contents
Fetching ...

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, Qinghua Hu

TL;DR

Attributed Graph Clustering suffers from weak supervision; the authors propose Dynamically Fusing Self-Supervised Learning (DyFSS), which uses a Mixture-of-Experts to fuse features from multiple SSL tasks on a per-node basis via a gating network. A dual-level self-supervised strategy, combining pseudo-label guidance and graph-structure supervision, stabilizes training and improves fusion embeddings. Empirically, DyFSS outperforms state-of-the-art multi-task SSL methods on five datasets, with up to $8.66\%$ absolute ACC gains, and reveals diverse per-node task weights, showing robustness to hyperparameters. The work advances node-wise SSL fusion for AGC and provides a practical framework for leveraging heterogeneous graph signals in clustering.

Abstract

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.

Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

TL;DR

Attributed Graph Clustering suffers from weak supervision; the authors propose Dynamically Fusing Self-Supervised Learning (DyFSS), which uses a Mixture-of-Experts to fuse features from multiple SSL tasks on a per-node basis via a gating network. A dual-level self-supervised strategy, combining pseudo-label guidance and graph-structure supervision, stabilizes training and improves fusion embeddings. Empirically, DyFSS outperforms state-of-the-art multi-task SSL methods on five datasets, with up to absolute ACC gains, and reveals diverse per-node task weights, showing robustness to hyperparameters. The work advances node-wise SSL fusion for AGC and provides a practical framework for leveraging heterogeneous graph signals in clustering.

Abstract

Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: https://github.com/q086/DyFSS.
Paper Structure (27 sections, 10 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a): A subgraph of Cora dataset. The same classes of nodes are annotated in the same color, and the number denotes the node index. (b)-(e): The heat maps of similarity matrices in latent space with node embeddings obtained by graph partition (PAR) SSL task, node attributes clustering (CLU) SSL task, AUTOSSL, and dynamic fusion operation.
  • Figure 2: Architecture of DyFSS model. We first use the pre-trained model to obtain the initial node embeddings. The embeddings are then fed into the dynamic fusion network to obtain the fusion embeddings. Specifically, each SSL task is allocated to a task-specific GCN layer (i.e., an expert) to extract features using the corresponding SSL loss, i.e.,$L_{clu}$, $L_{par}$, $L_{dgi}$, etc. Simultaneously, the gating network generates a set of weights for each node, culminating in the subsequent execution of the feature fusion operation. Lastly, we use high-quality labels and graph structure as supervised information to provide effective guidance for the training of the dynamic fusion network.
  • Figure 3: Ablation studies of dynamic fusion network. The baseline (B) is a per-trained ARVGA model, and B+DF is our method. The B+G indicates the baseline involving dynamic gating mechanisms and fixed experts. Conversely, B+SF signifies the baseline that uses a fixed gating network in combination with dynamic experts, representing a form of static fusion approach.
  • Figure 4: $t$-SNE visualization on Citeseer dataset.
  • Figure 5: The sensitivity analysis of DyFSS with variation of hyper-parameter $m$ on five datasets.
  • ...and 1 more figures