GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention
Junchi Yan, Fangyu Ding, Jiawei Sun, Zhaoping Hu, Yunyi Zhou, Lei Zhu
TL;DR
This work tackles graph out-of-distribution generalization by learning invariant subgraphs that relate to labels across environments. It introduces Graph Sinkhorn Attention (GSINA), a differentiable, sparsity-controllable subgraph extractor based on Sinkhorn iterations for optimal transport, with a Gumbel-based stabilization technique. The approach simultaneously enforces separability, softness, and differentiability, and provides theoretical exponential convergence of the OT solver. Empirical results across graph-level and node-level tasks show GSINA consistently improves OOD generalization over IB-based and top-$k$ baselines, while offering interpretability via visualizable invariant subgraphs. The method offers a practical, end-to-end framework for robust graph representations under distribution shifts with scalable training dynamics.
Abstract
Graph invariant learning (GIL) seeks invariant relations between graphs and labels under distribution shifts. Recent works try to extract an invariant subgraph to improve out-of-distribution (OOD) generalization, yet existing approaches either lack explicit control over compactness or rely on hard top-$k$ selection that shrinks the solution space and is only partially differentiable. In this paper, we provide an in-depth analysis of the drawbacks of some existing works and propose a few general principles for invariant subgraph extraction: 1) separability, as encouraged by our sparsity-driven mechanism, to filter out the irrelevant common features; 2) softness, for a broader solution space; and 3) differentiability, for a soundly end-to-end optimization pipeline. Specifically, building on optimal transport, we propose Graph Sinkhorn Attention (GSINA), a fully differentiable, cardinality-constrained attention mechanism that assigns sparse-yet-soft edge weights via Sinkhorn iterations and induces node attention. GSINA provides explicit controls for separability and softness, and uses a Gumbel reparameterization to stabilize training. It convergence behavior is also theoretically studied. Extensive empirical experimental results on both synthetic and real-world
