Score-based Conditional Out-of-Distribution Augmentation for Graph Covariate Shift
Bohan Wang, Yurui Chang, Wei Jin, Lu Lin
TL;DR
The paper addresses covariate shifts in graph learning by introducing CODA, a score-based conditional diffusion framework that generates OOD graphs conditioned on a target label and an exploration parameter to probe low-density regions while preserving stable, label-determining patterns. CODA avoids explicit environmental decomposition and labels, instead coupling an unconditional score with a time-dependent classifier to steer generation under a tractable OOD surrogate. Through extensive experiments on GOOD benchmarks, CODA consistently outperforms invariant-learning and augmentation baselines across motif, feature, scaffold, and length shifts, and provides tunable control over the extent of distributional exploration. The approach offers a practical, principled pathway for robust, web-scale graph analysis under pervasive distribution shifts.
Abstract
Distribution shifts between training and testing datasets significantly impair the model performance on graph learning. A commonly-taken causal view in graph invariant learning suggests that stable predictive features of graphs are causally associated with labels, whereas varying environmental features lead to distribution shifts. In particular, covariate shifts caused by unseen environments in test graphs underscore the critical need for out-of-distribution (OOD) generalization. Existing graph augmentation methods designed to address the covariate shift often disentangle the stable and environmental features in the input space, and selectively perturb or mixup the environmental features. However, such perturbation-based methods heavily rely on an accurate separation of stable and environmental features, and their exploration ability is confined to existing environmental features in the training distribution. To overcome these limitations, we introduce a novel distributional augmentation approach enabled by a tailored score-based conditional graph generation strategies to explore and synthesize unseen environments while preserving the validity and stable features of overall graph patterns. Our comprehensive empirical evaluations demonstrate the enhanced effectiveness of our method in improving graph OOD generalization.
