Table of Contents
Fetching ...

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo Kim

TL;DR

This work establishes new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training, and derives global and class-wise bounds via optimal transport through Wasserstein distances between encoded feature distributions.

Abstract

Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training. We derive global and class-wise bounds via optimal transport, expressed in terms of Wasserstein distances between encoded feature distributions. We demonstrate that our bounds are efficiently computable and strongly correlate with empirical generalization in graph node classification, improving upon classical complexity measures. Additionally, our analysis reveals how the GNN aggregation process transforms the representation distributions, inducing a trade-off between intra-class concentration and inter-class separation. This yields depth-dependent characterizations that capture the non-monotonic relationship between depth and generalization error observed in practice. The code is available at https://github.com/ml-postech/Transductive-OT-Gen-Bound.

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

TL;DR

This work establishes new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training, and derives global and class-wise bounds via optimal transport through Wasserstein distances between encoded feature distributions.

Abstract

Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training. We derive global and class-wise bounds via optimal transport, expressed in terms of Wasserstein distances between encoded feature distributions. We demonstrate that our bounds are efficiently computable and strongly correlate with empirical generalization in graph node classification, improving upon classical complexity measures. Additionally, our analysis reveals how the GNN aggregation process transforms the representation distributions, inducing a trade-off between intra-class concentration and inter-class separation. This yields depth-dependent characterizations that capture the non-monotonic relationship between depth and generalization error observed in practice. The code is available at https://github.com/ml-postech/Transductive-OT-Gen-Bound.
Paper Structure (39 sections, 9 theorems, 81 equations, 4 figures, 3 tables)

This paper contains 39 sections, 9 theorems, 81 equations, 4 figures, 3 tables.

Key Result

Theorem 4.1

Let $\gamma>0$. For any random split $\pi$, and all $f\circ\phi\in F\circ\Phi$, where for $i \in {\mathcal{I}}_{\mathrm{train}}^{(\pi)}, ~ j \in {\mathcal{I}}_{\mathrm{test}}^{(\pi)}, ~\text{and}~ y \in {\mathcal{Y}}.$

Figures (4)

  • Figure 1: Rank scatter plots of the empirical generalization error against (a) the PAC bound and (b) our proposed bound for SGC on the Squirrel dataset. The PAC bound shows weak rank correlation with the empirical generalization error, whereas our bound exhibits a stronger positive rank correlation.
  • Figure 2: Rank correlation between generalization bounds and empirical error gap across nine datasets and four GNN architectures. Global reports our bound from \ref{['thm:global-ot']}. Class-wise and Class-wise approx correspond to \ref{['thm:classwise-ot']} with and without test labels, respectively. Darker blue indicates a stronger positive correlation. Our bounds consistently achieve high correlations, while PAC and RC bounds show weak or negative correlations in most cases. N/A indicates the bound cannot be computed.
  • Figure 3: Depth analysis on SGC (top) and GCN (bottom) with Cora dataset.
  • Figure 4: Rank correlation between generalization bounds and empirical error gap across nine datasets and GraphSAGE. Darker blue indicates stronger positive correlation. N/A indicates the bound cannot be computed.

Theorems & Definitions (15)

  • Theorem 4.1: Global bound in the transductive setting
  • Theorem 4.2: Class-wise bound in the transductive setting
  • Remark : cf. Lemma 10 in chuang2021measuring; Proposition 5.2 in li2025towards
  • Proposition 6.1
  • Proposition 6.2
  • Theorem 1.1: Global bound in the transductive setting
  • proof
  • Theorem 1.1: Class-wise bound in the transductive setting
  • proof
  • Definition 1.1: $(m,u)$-permutation symmetry el2009transductive
  • ...and 5 more