Table of Contents
Fetching ...

Conditional Distribution Learning for Graph Classification

Jie Chen, Hua Mao, Chuanbin Liu, Zhu Wang, Xi Peng

TL;DR

Addresses the challenge of utilizing diverse graph augmentations while preserving intrinsic semantics in semisupervised graph classification. Proposes Conditional Distribution Learning (CDL), which aligns the conditional distributions of weakly and strongly augmented node embeddings relative to original embeddings, and uses positive pairs with the original features to measure similarity and avoid intraview conflicts. Employs a two-stage training scheme (pretraining with a shared GNN encoder and projection head, followed by fine-tuning with labeled data) and a distribution-divergence loss L_d alongside a positive-pair loss L_s and cross-entropy L_c. On eight benchmark graph datasets, CDL consistently outperforms state-of-the-art graph-contrastive and related methods, demonstrating its effectiveness in leveraging augmentations while preserving semantics.

Abstract

Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while graph contrastive learning aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism (MPM) of GNNs and the contrastive learning (CL) of negative pairs via intraviews. In this paper, we propose a conditional distribution learning (CDL) method that learns graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment enables the CDL model to effectively preserve intrinsic semantic information when both weak and strong augmentations are applied to graph-structured data. To avoid the conflict between the MPM and the CL of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed CDL method.

Conditional Distribution Learning for Graph Classification

TL;DR

Addresses the challenge of utilizing diverse graph augmentations while preserving intrinsic semantics in semisupervised graph classification. Proposes Conditional Distribution Learning (CDL), which aligns the conditional distributions of weakly and strongly augmented node embeddings relative to original embeddings, and uses positive pairs with the original features to measure similarity and avoid intraview conflicts. Employs a two-stage training scheme (pretraining with a shared GNN encoder and projection head, followed by fine-tuning with labeled data) and a distribution-divergence loss L_d alongside a positive-pair loss L_s and cross-entropy L_c. On eight benchmark graph datasets, CDL consistently outperforms state-of-the-art graph-contrastive and related methods, demonstrating its effectiveness in leveraging augmentations while preserving semantics.

Abstract

Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while graph contrastive learning aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism (MPM) of GNNs and the contrastive learning (CL) of negative pairs via intraviews. In this paper, we propose a conditional distribution learning (CDL) method that learns graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment enables the CDL model to effectively preserve intrinsic semantic information when both weak and strong augmentations are applied to graph-structured data. To avoid the conflict between the MPM and the CL of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed CDL method.

Paper Structure

This paper contains 21 sections, 15 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Framework of the CDL model. The graph-level representations, $\mathbf{H}$, $\mathbf{H}_s$ and $\mathbf{H}_w$, are produced by a shared GNN encoder module using the graph-structured data, a strongly augmented view and a weakly augmented view, respectively. $p\left( {\mathbf{h}_i^s|{\mathbf{h}_i}} \right)$ and $p\left( {\mathbf{h}_i^w|{\mathbf{h}_i}} \right)$ represent the conditional distributions given the raw graph-level representation of the $i$th node. $\mathbf{P}$ and $\mathbf{P}_w$ denote the projected representations of the raw graph-structured data and a weakly augmented view, respectively.
  • Figure : (a)
  • Figure : (a)
  • Figure : (b)
  • Figure : (c)