GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention

Junchi Yan; Fangyu Ding; Jiawei Sun; Zhaoping Hu; Yunyi Zhou; Lei Zhu

GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention

Junchi Yan, Fangyu Ding, Jiawei Sun, Zhaoping Hu, Yunyi Zhou, Lei Zhu

TL;DR

This work tackles graph out-of-distribution generalization by learning invariant subgraphs that relate to labels across environments. It introduces Graph Sinkhorn Attention (GSINA), a differentiable, sparsity-controllable subgraph extractor based on Sinkhorn iterations for optimal transport, with a Gumbel-based stabilization technique. The approach simultaneously enforces separability, softness, and differentiability, and provides theoretical exponential convergence of the OT solver. Empirical results across graph-level and node-level tasks show GSINA consistently improves OOD generalization over IB-based and top-$k$ baselines, while offering interpretability via visualizable invariant subgraphs. The method offers a practical, end-to-end framework for robust graph representations under distribution shifts with scalable training dynamics.

Abstract

Graph invariant learning (GIL) seeks invariant relations between graphs and labels under distribution shifts. Recent works try to extract an invariant subgraph to improve out-of-distribution (OOD) generalization, yet existing approaches either lack explicit control over compactness or rely on hard top-$k$ selection that shrinks the solution space and is only partially differentiable. In this paper, we provide an in-depth analysis of the drawbacks of some existing works and propose a few general principles for invariant subgraph extraction: 1) separability, as encouraged by our sparsity-driven mechanism, to filter out the irrelevant common features; 2) softness, for a broader solution space; and 3) differentiability, for a soundly end-to-end optimization pipeline. Specifically, building on optimal transport, we propose Graph Sinkhorn Attention (GSINA), a fully differentiable, cardinality-constrained attention mechanism that assigns sparse-yet-soft edge weights via Sinkhorn iterations and induces node attention. GSINA provides explicit controls for separability and softness, and uses a Gumbel reparameterization to stabilize training. It convergence behavior is also theoretically studied. Extensive empirical experimental results on both synthetic and real-world

GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention

TL;DR

baselines, while offering interpretability via visualizable invariant subgraphs. The method offers a practical, end-to-end framework for robust graph representations under distribution shifts with scalable training dynamics.

Abstract

selection that shrinks the solution space and is only partially differentiable. In this paper, we provide an in-depth analysis of the drawbacks of some existing works and propose a few general principles for invariant subgraph extraction: 1) separability, as encouraged by our sparsity-driven mechanism, to filter out the irrelevant common features; 2) softness, for a broader solution space; and 3) differentiability, for a soundly end-to-end optimization pipeline. Specifically, building on optimal transport, we propose Graph Sinkhorn Attention (GSINA), a fully differentiable, cardinality-constrained attention mechanism that assigns sparse-yet-soft edge weights via Sinkhorn iterations and induces node attention. GSINA provides explicit controls for separability and softness, and uses a Gumbel reparameterization to stabilize training. It convergence behavior is also theoretically studied. Extensive empirical experimental results on both synthetic and real-world

Paper Structure (28 sections, 1 theorem, 18 equations, 8 figures, 13 tables, 1 algorithm)

This paper contains 28 sections, 1 theorem, 18 equations, 8 figures, 13 tables, 1 algorithm.

Introduction
Preliminaries and Related Works
Graph OOD Generalization
Graph-level Invariant Learning
Node-level Invariant Learning
Subgraph-based Invariant Learning
Cardinality-based Combinatorial Optimization
Approach
Graph Sinkhorn Attention: Optimization
Graph Sinkhorn Attention: Implementation
Graph Sinkhorn Attention: Theoretical Study on its Convergence Property
Experiments
Graph-Level Tasks: Compare with IB-Based GIL
Datasets, Metrics and Baselines
Setup Details
...and 13 more sections

Key Result

Proposition 1

(exponential convergence of Algorithm alg:train) Given an initial matrix $\mathbf{T}_0$ and target row and column sums $\mathbf{R}$ and $\mathbf{C}$, respectively, the Sinkhorn algorithm alternates between row and column normalization steps, and the matrix $\mathbf{T}_k$ converges to the optimal sol where $\| \cdot \|_2$ denotes the spectral norm of a matrix.

Figures (8)

Figure 1: Subgraph separability of GSAT miao2022interpretable and our proposed GSINA (with subgraph extraction ratio $r=0.3$) on a batch of samples in SPMotif (with graph generation hyperparamter $b=0.5$) dataset (from DIR wu2022discovering). Note that Fig. \ref{['fig:gsat_demo']} and Fig. \ref{['fig:ours_demo']} demonstrate the learned edge importance (i.e. attention values) by GSAT and GSINA for each edge of the graphs in the batch, and the bottom right black regions are non-edge padding. The X-axis represents edges (sorted by attention values), and the Y-axis indexes the graphs. Note that here the graph sizes vary. For visualization we pad each row to the batch maximum edge count; the black cells on the right/bottom indicate padding (non-edges). The color intensity reflects the relative magnitude of attention scores across the edges, not necessarily their final normalized values. Also note that for each graph and each method independently, we sort that graph’s edge-attention scores in descending order and place them left→right within that row. The x-axis is therefore edge rank (sorted within graph; 1 = largest), the y-axis is graph ID, and the color encodes the (unnormalized) attention value. Because sorting is per-method, the per-row order in (a) and (b) need not match; small apparent inversions can arise from ties and color quantization after padding. Fig. \ref{['fig:gsat_pdf']} and Fig. \ref{['fig:ours_pdf']} are the PDF (probability density function) of edge attention generated by GSAT and GSINA for the edges in background (label-independent) part and explanation (label-related) part.
Figure 2: Overview of our approach. A separable invariant subgraph is extracted from a given input graph using Sinkhorn-based optimal transport with controllable separability and softness. Edge and node attention scores are computed to form the subgraph, which are then fed into a predictor to generate the output label $Y$ for nodes and graphs.
Figure 3: Example of GSINA invariant subgraph $G_S$ from SPMotif dataset. The ground truth (nodes) of the invariant subgraph is colored red, and the other part is yellow. The edge widths and node sizes are given by GSINA original outputs (we do not apply sharpening tricks for visualizations like GSAT miao2022interpretable and CIGA chen2022invariance). It is shown that our GSINA assigns sparse and soft attention $\{\alpha^V, \alpha^E \}$ to the nodes and edges of the input graph $G$. More visualization results can be found in https://github.com/dingfangyu/GSINA/vis.
Figure 4: Hyperparameter sensitivity (classification accuracy as y-axis metric) analysis of the separability ratio $r$ in GSINA.
Figure 5: Hyperparameter sensitivity (classification accuracy as y-axis metric) of the temperature $\tau$ (for softness) in GSINA.
...and 3 more figures

Theorems & Definitions (2)

Proposition 1
Proof 1

GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention

TL;DR

Abstract

GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn Attention

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)