Table of Contents
Fetching ...

Can Modifying Data Address Graph Domain Adaptation?

Renhong Huang, Jiarong Xu, Xin Jiang, Ruichuan An, Yang Yang

TL;DR

This work reframes Unsupervised Graph Domain Adaptation (UGDA) from a data-centric perspective, arguing that modifying the source graph can outperform purely model-centric approaches under distribution shifts. It derives a generalization bound that motivates two principles—Alignment and Rescaling—and introduces GraphAlign, which generates a small yet transferable graph $G^{\prime}$ and trains a GNN on it with empirical risk minimization. GraphAlign combines a gradient-mimicking loss, an MMD-based alignment objective, and a propagation-based regularizer, with a structured graph generator that uses a feature-driven, differentiable construction. Across diverse transfer scenarios, GraphAlign consistently surpasses baselines by an average of $+2.16\%$, while using a generated graph as small as $0.25\%$–$1\%$ of the original training data, highlighting both efficacy and efficiency of data-centric UGDA.

Abstract

Graph neural networks (GNNs) have demonstrated remarkable success in numerous graph analytical tasks. Yet, their effectiveness is often compromised in real-world scenarios due to distribution shifts, limiting their capacity for knowledge transfer across changing environments or domains. Recently, Unsupervised Graph Domain Adaptation (UGDA) has been introduced to resolve this issue. UGDA aims to facilitate knowledge transfer from a labeled source graph to an unlabeled target graph. Current UGDA efforts primarily focus on model-centric methods, such as employing domain invariant learning strategies and designing model architectures. However, our critical examination reveals the limitations inherent to these model-centric methods, while a data-centric method allowed to modify the source graph provably demonstrates considerable potential. This insight motivates us to explore UGDA from a data-centric perspective. By revisiting the theoretical generalization bound for UGDA, we identify two data-centric principles for UGDA: alignment principle and rescaling principle. Guided by these principles, we propose GraphAlign, a novel UGDA method that generates a small yet transferable graph. By exclusively training a GNN on this new graph with classic Empirical Risk Minimization (ERM), GraphAlign attains exceptional performance on the target graph. Extensive experiments under various transfer scenarios demonstrate the GraphAlign outperforms the best baselines by an average of 2.16%, training on the generated graph as small as 0.25~1% of the original training graph.

Can Modifying Data Address Graph Domain Adaptation?

TL;DR

This work reframes Unsupervised Graph Domain Adaptation (UGDA) from a data-centric perspective, arguing that modifying the source graph can outperform purely model-centric approaches under distribution shifts. It derives a generalization bound that motivates two principles—Alignment and Rescaling—and introduces GraphAlign, which generates a small yet transferable graph and trains a GNN on it with empirical risk minimization. GraphAlign combines a gradient-mimicking loss, an MMD-based alignment objective, and a propagation-based regularizer, with a structured graph generator that uses a feature-driven, differentiable construction. Across diverse transfer scenarios, GraphAlign consistently surpasses baselines by an average of , while using a generated graph as small as of the original training data, highlighting both efficacy and efficiency of data-centric UGDA.

Abstract

Graph neural networks (GNNs) have demonstrated remarkable success in numerous graph analytical tasks. Yet, their effectiveness is often compromised in real-world scenarios due to distribution shifts, limiting their capacity for knowledge transfer across changing environments or domains. Recently, Unsupervised Graph Domain Adaptation (UGDA) has been introduced to resolve this issue. UGDA aims to facilitate knowledge transfer from a labeled source graph to an unlabeled target graph. Current UGDA efforts primarily focus on model-centric methods, such as employing domain invariant learning strategies and designing model architectures. However, our critical examination reveals the limitations inherent to these model-centric methods, while a data-centric method allowed to modify the source graph provably demonstrates considerable potential. This insight motivates us to explore UGDA from a data-centric perspective. By revisiting the theoretical generalization bound for UGDA, we identify two data-centric principles for UGDA: alignment principle and rescaling principle. Guided by these principles, we propose GraphAlign, a novel UGDA method that generates a small yet transferable graph. By exclusively training a GNN on this new graph with classic Empirical Risk Minimization (ERM), GraphAlign attains exceptional performance on the target graph. Extensive experiments under various transfer scenarios demonstrate the GraphAlign outperforms the best baselines by an average of 2.16%, training on the generated graph as small as 0.25~1% of the original training graph.
Paper Structure (21 sections, 5 theorems, 37 equations, 6 figures, 10 tables, 2 algorithms)

This paper contains 21 sections, 5 theorems, 37 equations, 6 figures, 10 tables, 2 algorithms.

Key Result

Proposition 1

Assuming the feature extractor $f$ is a single-layer GNN, and it is trained with the domain-invariant constraint $\mathbb{P}(f(G^{\mathcal{S}})) \\ = \mathbb{P}(f(G^{\mathcal{T}}))$, and then used for inference on the target graph. When such a GNN $f$ is applied to Example example, the classificatio

Figures (6)

  • Figure 1: Comparison between existing UGDA methods (which are all model-centric) and our data-centric method GraphAlign. Guided by the rescaling and alignment principles, GraphAlign generates a small yet transferable graph, on which a simple GNN is trained with classic ERM. GraphAlign deviates from conventional approaches that employ sophisticated model design, and achieves outstanding practical performance.
  • Figure 2: The figure illustrates how the rescaling term varies with the scale of the source graph. We specify $\delta=0.01$ to ensure that the \ref{['eq:bound']} holds with a probability of at least 99%. The pseudo-dimension $d$ is set to $1000$, which is a reasonable assumption based on devroye1996vapnik (note that the trend of the rescaling term's variation is consistent, regardless of the value of $d$). The horizontal axis is presented on a logarithmic scale.
  • Figure 3: Ablation studies on A$\rightarrow$D and C$\rightarrow$A tasks.
  • Figure 4: Our results on D$\rightarrow$A task w.r.t varying $r$, $\alpha_1$ and $\alpha_2$. The dashed line represents the performance of the best baseline.
  • Figure 5: Loss curve between initialization of GraphAlign and random initialization on D$\rightarrow$A.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1: Contextual Stochastic Block Model
  • Example 1
  • Proposition 1
  • Proposition 2
  • Theorem 1: Generalization bound for UGDA shen2018wasserstein
  • Definition 2: Data-Centric UGDA
  • Theorem 2: GNN transferability
  • Theorem 3