Conditional Distribution Learning for Graph Classification
Jie Chen, Hua Mao, Chuanbin Liu, Zhu Wang, Xi Peng
TL;DR
Addresses the challenge of utilizing diverse graph augmentations while preserving intrinsic semantics in semisupervised graph classification. Proposes Conditional Distribution Learning (CDL), which aligns the conditional distributions of weakly and strongly augmented node embeddings relative to original embeddings, and uses positive pairs with the original features to measure similarity and avoid intraview conflicts. Employs a two-stage training scheme (pretraining with a shared GNN encoder and projection head, followed by fine-tuning with labeled data) and a distribution-divergence loss L_d alongside a positive-pair loss L_s and cross-entropy L_c. On eight benchmark graph datasets, CDL consistently outperforms state-of-the-art graph-contrastive and related methods, demonstrating its effectiveness in leveraging augmentations while preserving semantics.
Abstract
Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while graph contrastive learning aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism (MPM) of GNNs and the contrastive learning (CL) of negative pairs via intraviews. In this paper, we propose a conditional distribution learning (CDL) method that learns graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment enables the CDL model to effectively preserve intrinsic semantic information when both weak and strong augmentations are applied to graph-structured data. To avoid the conflict between the MPM and the CL of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed CDL method.
