Table of Contents
Fetching ...

Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph

Weihuang Zheng, Jiashuo Liu, Jiaxing Li, Jiayun Wu, Peng Cui, Youyong Kong

TL;DR

This work tackles distribution shifts in graph-based node classification by proposing Topology-Aware Dynamic Reweighting (TAR), which reweights samples through gradient flow in the discrete geometric Wasserstein space to incorporate graph topology into robustness. The method casts training as a minimax problem over model parameters and sample densities, using entropy and topology-based penalties to regularize the reweighting process. The authors prove that the inner gradient flow approximates a local worst-case distribution, yielding distributional robustness with an error bound that decays exponentially with the number of gradient steps $T_{\text{in}}$. Empirically, TAR improves over strong baselines on four OOD datasets and three class-imbalanced datasets, without requiring domain labels, and demonstrates resilience to both covariate and concept shifts as well as label imbalance. The results suggest that leveraging graph structure via geometric Wasserstein gradient flow provides a principled and effective avenue for graph OOD generalization and robust node classification.

Abstract

Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions, limiting their practicality. To overcome this, recent approaches adopt invariant learning techniques from the out-of-distribution (OOD) generalization field, which seek to establish stable prediction methods across environments. However, the applicability of these invariant assumptions to graph data remains unverified, and such methods often lack solid theoretical support. In this work, we introduce the Topology-Aware Dynamic Reweighting (TAR) framework, which dynamically adjusts sample weights through gradient flow in the geometric Wasserstein space during training. Instead of relying on strict invariance assumptions, we prove that our method is able to provide distributional robustness, thereby enhancing the out-of-distribution generalization performance on graph data. By leveraging the inherent graph structure, TAR effectively addresses distribution shifts. Our framework's superiority is demonstrated through standard testing on four graph OOD datasets and three class-imbalanced node classification datasets, exhibiting marked improvements over existing methods.

Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph

TL;DR

This work tackles distribution shifts in graph-based node classification by proposing Topology-Aware Dynamic Reweighting (TAR), which reweights samples through gradient flow in the discrete geometric Wasserstein space to incorporate graph topology into robustness. The method casts training as a minimax problem over model parameters and sample densities, using entropy and topology-based penalties to regularize the reweighting process. The authors prove that the inner gradient flow approximates a local worst-case distribution, yielding distributional robustness with an error bound that decays exponentially with the number of gradient steps . Empirically, TAR improves over strong baselines on four OOD datasets and three class-imbalanced datasets, without requiring domain labels, and demonstrates resilience to both covariate and concept shifts as well as label imbalance. The results suggest that leveraging graph structure via geometric Wasserstein gradient flow provides a principled and effective avenue for graph OOD generalization and robust node classification.

Abstract

Graph Neural Networks (GNNs) are widely used for node classification tasks but often fail to generalize when training and test nodes come from different distributions, limiting their practicality. To overcome this, recent approaches adopt invariant learning techniques from the out-of-distribution (OOD) generalization field, which seek to establish stable prediction methods across environments. However, the applicability of these invariant assumptions to graph data remains unverified, and such methods often lack solid theoretical support. In this work, we introduce the Topology-Aware Dynamic Reweighting (TAR) framework, which dynamically adjusts sample weights through gradient flow in the geometric Wasserstein space during training. Instead of relying on strict invariance assumptions, we prove that our method is able to provide distributional robustness, thereby enhancing the out-of-distribution generalization performance on graph data. By leveraging the inherent graph structure, TAR effectively addresses distribution shifts. Our framework's superiority is demonstrated through standard testing on four graph OOD datasets and three class-imbalanced node classification datasets, exhibiting marked improvements over existing methods.
Paper Structure (29 sections, 2 theorems, 23 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 2 theorems, 23 equations, 2 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

For any $\gamma > 0, t>0$ and given $\theta$, denote the solution of Equation equ:discrete-flow as $q^\star=\arg\max_{q\in\mathscr{P}_o(G_0)}\mathcal{L}(\theta,q)-\gamma\mathcal{GW}_{G_0}^2(p,q)$. Let $\epsilon=\mathcal{GW}^2_{G_0}(p,q^\star)$, we have The proof can be found in Appendix sec:proof.

Figures (2)

  • Figure 1: Illustration of the gradient flow in the geometric Wasserstein space $(\mathcal{P}_o(G_0), \mathcal{GW}_{G_0})$, where each point denotes a probability distribution in $\mathcal{P}_o(G_0)$, and the distance is measure by the discrete geometric Wasserstein distance. The black circle denotes the local distribution set around a distribution, and the blue arrow represents the one-step gradient flow. $q^\tau(T)$ denotes the approximated inner maximizer obtained by our algorithm, and $q^\star$ denotes the ground-truth inner maximizer (defined in Theorem \ref{['theorem:error']}). In Theorem \ref{['theorem:dro']}, we demonstrate that the one-step gradient flow is equivalent to distributionally robust optimization around a local uncertainty set, and in Theorem \ref{['theorem:error']}, we characterize the approximation error rate between $q^\tau(T)$ and $q^\star$.
  • Figure 2: The effects of hyper-parameters $T_{\text{in}}$ (the number of gradient flow) and $\beta$ (the coefficient of the entropy penalty) of our proposed TAR algorithm.

Theorems & Definitions (3)

  • Definition 1: Discrete Geometric Wasserstein Distance chow2017entropy
  • Theorem 1: Distributional robustness
  • Theorem 2: Approximation error rate