Table of Contents
Fetching ...

DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning

Yifan Wang, Xiao Luo, Chong Chen, Xian-Sheng Hua, Ming Zhang, Wei Ju

TL;DR

DisenSemi tackles semi-supervised graph classification by learning disentangled, factor-wise representations that align unsupervised and supervised tasks. It introduces a GNN-based encoder that decomposes graphs into $K$ factor graphs, with MI-based intra-factor maximization and inter-factor minimization to foster meaningful, independent factors. A disentangled consistency regularization, framed as a variational EM objective, transfers factor-specific knowledge between the unsupervised and supervised models, improving predictive performance and interpretability. Extensive experiments on ten public benchmarks demonstrate state-of-the-art accuracy and robust performance across labeling regimes, with clear evidence of the benefits of factor-wise transfer and MI-based regularization.

Abstract

Graph classification is a critical task in numerous multimedia applications, where graphs are employed to represent diverse types of multimedia data, including images, videos, and social networks. Nevertheless, in real-world scenarios, labeled graph data can be limited or scarce. To address this issue, we focus on the problem of semi-supervised graph classification, which involves both supervised and unsupervised models learning from labeled and unlabeled data. In contrast to recent approaches that transfer the entire knowledge from the unsupervised model to the supervised one, we argue that an effective transfer should only retain the relevant semantics that align well with the supervised task. In this paper, we propose a novel framework named DisenSemi, which learns disentangled representation for semi-supervised graph classification. Specifically, a disentangled graph encoder is proposed to generate factor-wise graph representations for both supervised and unsupervised models. Then we train two models via supervised objective and mutual information (MI)-based constraints respectively. To ensure the meaningful transfer of knowledge from the unsupervised encoder to the supervised one, we further define an MI-based disentangled consistency regularization between two models and identify the corresponding rationale that aligns well with the current graph classification task. Experimental results on a range of publicly accessible datasets reveal the effectiveness of our DisenSemi.

DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning

TL;DR

DisenSemi tackles semi-supervised graph classification by learning disentangled, factor-wise representations that align unsupervised and supervised tasks. It introduces a GNN-based encoder that decomposes graphs into factor graphs, with MI-based intra-factor maximization and inter-factor minimization to foster meaningful, independent factors. A disentangled consistency regularization, framed as a variational EM objective, transfers factor-specific knowledge between the unsupervised and supervised models, improving predictive performance and interpretability. Extensive experiments on ten public benchmarks demonstrate state-of-the-art accuracy and robust performance across labeling regimes, with clear evidence of the benefits of factor-wise transfer and MI-based regularization.

Abstract

Graph classification is a critical task in numerous multimedia applications, where graphs are employed to represent diverse types of multimedia data, including images, videos, and social networks. Nevertheless, in real-world scenarios, labeled graph data can be limited or scarce. To address this issue, we focus on the problem of semi-supervised graph classification, which involves both supervised and unsupervised models learning from labeled and unlabeled data. In contrast to recent approaches that transfer the entire knowledge from the unsupervised model to the supervised one, we argue that an effective transfer should only retain the relevant semantics that align well with the supervised task. In this paper, we propose a novel framework named DisenSemi, which learns disentangled representation for semi-supervised graph classification. Specifically, a disentangled graph encoder is proposed to generate factor-wise graph representations for both supervised and unsupervised models. Then we train two models via supervised objective and mutual information (MI)-based constraints respectively. To ensure the meaningful transfer of knowledge from the unsupervised encoder to the supervised one, we further define an MI-based disentangled consistency regularization between two models and identify the corresponding rationale that aligns well with the current graph classification task. Experimental results on a range of publicly accessible datasets reveal the effectiveness of our DisenSemi.
Paper Structure (38 sections, 21 equations, 8 figures, 3 tables)

This paper contains 38 sections, 21 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A schematic view of the DisenSemi, which consists of a supervised and an unsupervised model. We extract factor-wise graph representations from two models and maximize the agreement in a factor-wise manner.
  • Figure 2: Illustration of the supervised and unsupervised training module. We assume that there are three aspects for the input graphs and factorize them into different factor graphs. For the unsupervised module, we maximize the intra-factor, and minimize the inter-factor MI to disentangle the graph representation effectively. Unlike MI-based constraints in the unsupervised module, we merge all extracted factor-wise graph representations to predict graph labels in the supervised module.
  • Figure 3: Performance comparison with different labeling ratios (i.e., $\%10$, $\%30$, $\%50$ and $\%70$) on four datasets (i.e., MUTAG, PROTEINS, IMDB-BINARY and REDDIT-BINARY).
  • Figure 4: Performance w.r.t. different numbers of factor graphs in four datasets (i.e., MUTAG, PROTEINS, IMDB-BINARY and REDDIT-BINARY).
  • Figure 5: Performance w.r.t. different numbers of message passing layers in four datasets (i.e., MUTAG, PROTEINS, IMDB-BINARY and REDDIT-BINARY).
  • ...and 3 more figures