Can Modifying Data Address Graph Domain Adaptation?

Renhong Huang; Jiarong Xu; Xin Jiang; Ruichuan An; Yang Yang

Can Modifying Data Address Graph Domain Adaptation?

Renhong Huang, Jiarong Xu, Xin Jiang, Ruichuan An, Yang Yang

TL;DR

This work reframes Unsupervised Graph Domain Adaptation (UGDA) from a data-centric perspective, arguing that modifying the source graph can outperform purely model-centric approaches under distribution shifts. It derives a generalization bound that motivates two principles—Alignment and Rescaling—and introduces GraphAlign, which generates a small yet transferable graph $G^{\prime}$ and trains a GNN on it with empirical risk minimization. GraphAlign combines a gradient-mimicking loss, an MMD-based alignment objective, and a propagation-based regularizer, with a structured graph generator that uses a feature-driven, differentiable construction. Across diverse transfer scenarios, GraphAlign consistently surpasses baselines by an average of $+2.16\%$, while using a generated graph as small as $0.25\%$–$1\%$ of the original training data, highlighting both efficacy and efficiency of data-centric UGDA.

Abstract

Graph neural networks (GNNs) have demonstrated remarkable success in numerous graph analytical tasks. Yet, their effectiveness is often compromised in real-world scenarios due to distribution shifts, limiting their capacity for knowledge transfer across changing environments or domains. Recently, Unsupervised Graph Domain Adaptation (UGDA) has been introduced to resolve this issue. UGDA aims to facilitate knowledge transfer from a labeled source graph to an unlabeled target graph. Current UGDA efforts primarily focus on model-centric methods, such as employing domain invariant learning strategies and designing model architectures. However, our critical examination reveals the limitations inherent to these model-centric methods, while a data-centric method allowed to modify the source graph provably demonstrates considerable potential. This insight motivates us to explore UGDA from a data-centric perspective. By revisiting the theoretical generalization bound for UGDA, we identify two data-centric principles for UGDA: alignment principle and rescaling principle. Guided by these principles, we propose GraphAlign, a novel UGDA method that generates a small yet transferable graph. By exclusively training a GNN on this new graph with classic Empirical Risk Minimization (ERM), GraphAlign attains exceptional performance on the target graph. Extensive experiments under various transfer scenarios demonstrate the GraphAlign outperforms the best baselines by an average of 2.16%, training on the generated graph as small as 0.25~1% of the original training graph.

Can Modifying Data Address Graph Domain Adaptation?

TL;DR

and trains a GNN on it with empirical risk minimization. GraphAlign combines a gradient-mimicking loss, an MMD-based alignment objective, and a propagation-based regularizer, with a structured graph generator that uses a feature-driven, differentiable construction. Across diverse transfer scenarios, GraphAlign consistently surpasses baselines by an average of

, while using a generated graph as small as

–

of the original training data, highlighting both efficacy and efficiency of data-centric UGDA.

Abstract

Paper Structure (21 sections, 5 theorems, 37 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 21 sections, 5 theorems, 37 equations, 6 figures, 10 tables, 2 algorithms.

Introduction
Preliminaries
Data-Centric Principles
Motivating Example
Data-Centric Principles for UGDA
Proposed Method: GraphAlign
Optimization Problem
Modeling the Generated Graph
Complexity Analysis
Experiments
Experimental Setup
Experimental Results
Related Work
Conclusion
Appendix
...and 6 more sections

Key Result

Proposition 1

Assuming the feature extractor $f$ is a single-layer GNN, and it is trained with the domain-invariant constraint $\mathbb{P}(f(G^{\mathcal{S}})) \\ = \mathbb{P}(f(G^{\mathcal{T}}))$, and then used for inference on the target graph. When such a GNN $f$ is applied to Example example, the classificatio

Figures (6)

Figure 1: Comparison between existing UGDA methods (which are all model-centric) and our data-centric method GraphAlign. Guided by the rescaling and alignment principles, GraphAlign generates a small yet transferable graph, on which a simple GNN is trained with classic ERM. GraphAlign deviates from conventional approaches that employ sophisticated model design, and achieves outstanding practical performance.
Figure 2: The figure illustrates how the rescaling term varies with the scale of the source graph. We specify $\delta=0.01$ to ensure that the \ref{['eq:bound']} holds with a probability of at least 99%. The pseudo-dimension $d$ is set to $1000$, which is a reasonable assumption based on devroye1996vapnik (note that the trend of the rescaling term's variation is consistent, regardless of the value of $d$). The horizontal axis is presented on a logarithmic scale.
Figure 3: Ablation studies on A$\rightarrow$D and C$\rightarrow$A tasks.
Figure 4: Our results on D$\rightarrow$A task w.r.t varying $r$, $\alpha_1$ and $\alpha_2$. The dashed line represents the performance of the best baseline.
Figure 5: Loss curve between initialization of GraphAlign and random initialization on D$\rightarrow$A.
...and 1 more figures

Theorems & Definitions (8)

Definition 1: Contextual Stochastic Block Model
Example 1
Proposition 1
Proposition 2
Theorem 1: Generalization bound for UGDA shen2018wasserstein
Definition 2: Data-Centric UGDA
Theorem 2: GNN transferability
Theorem 3

Can Modifying Data Address Graph Domain Adaptation?

TL;DR

Abstract

Can Modifying Data Address Graph Domain Adaptation?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)