Learning Divergence Fields for Shift-Robust Graph Representations
Qitian Wu, Fan Nie, Chenxiao Yang, Junchi Yan
TL;DR
Problem: robust generalization under distribution shifts for interdependent data on graphs/manifolds. Approach: a geometric diffusion model with learnable divergence fields, where at each step $d^{(l)}$ is sampled from $p(\mathbf d^{(l)}|\mathbf z^{(l)})$ and the diffusion trajectory updates via $\mathbf z^{(l+1)} = \mathbf z^{(l)} + \alpha \sum_{v \in \mathcal N(u)} d_{uv}^{(l)} (\mathbf z_v^{(l)} - \mathbf z_u^{(l)})$, together with a causal regularization that optimizes a variational lower bound on $\log p_\theta(\mathbf y|\mathbf x, \mathcal G)$ under interventions. Contributions: (i) diffusion-on-graphs with stochastic divergences, (ii) a step-wise re-weighting regularization using $p_0(\mathbf d^{(l)})$ to approximate $p_\theta(\mathbf y|do(\mathbf x), \mathcal G)$, (iii) three practical backbones Glind-GCN, Glind-GAT, Glind-Trans, and a data-driven prior via a mixture of pseudo posteriors $ p_0(\mathbf d^{(l)}) = \frac{1}{T} \sum_{t=1}^T q(\mathbf d^{(l)}|\mathbf z^{(l)}=\tilde{\mathbf z}^{(l)}_t)$, (iv) extensive experiments demonstrating improved out-of-distribution generalization across datasets with observed and latent geometries. Significance: enables shift-robust graph representations applicable to diverse domains.
Abstract
Real-world data generation often involves certain geometries (e.g., graphs) that induce instance-level interdependence. This characteristic makes the generalization of learning models more difficult due to the intricate interdependent patterns that impact data-generative distributions and can vary from training to testing. In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data. We generalize the diffusion equation with stochastic diffusivity at each time step, which aims to capture the multi-faceted information flows among interdependent data. Furthermore, we derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains. Regarding practical implementation, we introduce three model instantiations that can be considered as the generalized versions of GCN, GAT, and Transformers, respectively, which possess advanced robustness against distribution shifts. We demonstrate their promising efficacy for out-of-distribution generalization on diverse real-world datasets.
