Table of Contents
Fetching ...

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

Jiaren Xiao, Quanyu Dai, Xiao Shen, Xiaochen Xie, Jing Dai, James Lam, Ka-Wai Kwok

TL;DR

SemiGCL tackles semi-supervised domain adaptation for graph node classification under limited target labels by jointly learning representations from two structural views (local and diffusion-based global) using graph contrastive learning, while mitigating source-target bias via minimax entropy with a cosine-similarity classifier. The method unifies two GNN encoders, a contrastive objective, cross-entropy supervision on labeled nodes, and an adversarial entropy game to align distributions across graphs. Theoretical grounding links minimax entropy to bounded domain divergence, and empirical results on eight transfer tasks across five real-world networks show state-of-the-art performance in SSDA settings, with notable gains over cross-graph and UDA baselines. The work highlights the value of combining multi-view graph representations with entropy-based domain alignment for transferable graph representations in label-scarce scenarios, and provides public code for reproduction.

Abstract

Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. This paper proposes a novel method called SemiGCL to tackle the graph \textbf{Semi}-supervised domain adaptation with \textbf{G}raph \textbf{C}ontrastive \textbf{L}earning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks. The source codes of SemiGCL are publicly available at https://github.com/ JiarenX/SemiGCL.

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

TL;DR

SemiGCL tackles semi-supervised domain adaptation for graph node classification under limited target labels by jointly learning representations from two structural views (local and diffusion-based global) using graph contrastive learning, while mitigating source-target bias via minimax entropy with a cosine-similarity classifier. The method unifies two GNN encoders, a contrastive objective, cross-entropy supervision on labeled nodes, and an adversarial entropy game to align distributions across graphs. Theoretical grounding links minimax entropy to bounded domain divergence, and empirical results on eight transfer tasks across five real-world networks show state-of-the-art performance in SSDA settings, with notable gains over cross-graph and UDA baselines. The work highlights the value of combining multi-view graph representations with entropy-based domain alignment for transferable graph representations in label-scarce scenarios, and provides public code for reproduction.

Abstract

Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. This paper proposes a novel method called SemiGCL to tackle the graph \textbf{Semi}-supervised domain adaptation with \textbf{G}raph \textbf{C}ontrastive \textbf{L}earning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks. The source codes of SemiGCL are publicly available at https://github.com/ JiarenX/SemiGCL.
Paper Structure (32 sections, 21 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 32 sections, 21 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Semi-supervised domain adaptation on graphs. The source and target graphs are two independent domains with distinct data distributions. The source graph is fully labeled, while the target graph has a limited number of labeled nodes per class.
  • Figure 2: Architecture of the proposed model. Two GNN encoders extract node representations from two structural views of a graph, i.e., the original graph (local view) and the diffusion-augmented graph (global view). Node representations from the local and global views are then contrasted and concatenated to obtain an informative node embedding vector. The cosine similarity-based node classifier computes the label prediction by taking the node embedding vector as input. The domain divergence between the source and target graphs is reduced by adversarially optimizing the model with the entropy loss. More details can be found in Section \ref{['sec_method']}.
  • Figure 3: Contrastive learning between the local view (original graph) and the global view (diffusion-augmented graph).
  • Figure 4: Node class distribution in the neighborhood of a node that belongs to Class 3. On every graph, the reported percentage of each class is obtained by averaging the statistics of all nodes that are of Class 3.
  • Figure 5: Loss over training epoch.
  • ...and 3 more figures