Table of Contents
Fetching ...

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

Huayi Tang, Yong Liu

TL;DR

The paper develops a comprehensive information-theoretic and PAC-Bayesian framework for transductive learning, addressing both random splitting and random sampling, and introduces transductive supersamples to bridge inductive results to transduction. It derives mutual information and conditional mutual information bounds, along with transductive PAC-Bayes bounds, under relaxed loss and data-count assumptions, and extends to adaptive optimization scenarios. The results are applied to semi-supervised learning and transductive graph learning, with empirical validation showing non-vacuous bounds that track the generalization gap as labeled data increases. Together, these contributions provide principled, data- and algorithm-dependent generalization guarantees for transductive models, including GNNs, in practical setups. The work has implications for understanding generalization in complex, label-efficient learning settings and informs the design of transductive strategies in real-world systems.

Abstract

In this paper, we establish generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayes, covering both the random sampling and the random splitting setting. First, we show that the transductive generalization gap can be controlled by the mutual information between training label selection and the hypothesis. Next, we propose the concept of transductive supersample and use it to derive transductive information-theoretic bounds involving conditional mutual information and different information measures. We further establish transductive PAC-Bayesian bounds with weaker assumptions on the type of loss function and the number of training and test data points. Lastly, we use the theoretical results to derive upper bounds for adaptive optimization algorithms under the transductive learning setting. We also apply them to semi-supervised learning and transductive graph learning scenarios, meanwhile validating the derived bounds by experiments on synthetic and real-world datasets.

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

TL;DR

The paper develops a comprehensive information-theoretic and PAC-Bayesian framework for transductive learning, addressing both random splitting and random sampling, and introduces transductive supersamples to bridge inductive results to transduction. It derives mutual information and conditional mutual information bounds, along with transductive PAC-Bayes bounds, under relaxed loss and data-count assumptions, and extends to adaptive optimization scenarios. The results are applied to semi-supervised learning and transductive graph learning, with empirical validation showing non-vacuous bounds that track the generalization gap as labeled data increases. Together, these contributions provide principled, data- and algorithm-dependent generalization guarantees for transductive models, including GNNs, in practical setups. The work has implications for understanding generalization in complex, label-efficient learning settings and informs the design of transductive strategies in real-world systems.

Abstract

In this paper, we establish generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayes, covering both the random sampling and the random splitting setting. First, we show that the transductive generalization gap can be controlled by the mutual information between training label selection and the hypothesis. Next, we propose the concept of transductive supersample and use it to derive transductive information-theoretic bounds involving conditional mutual information and different information measures. We further establish transductive PAC-Bayesian bounds with weaker assumptions on the type of loss function and the number of training and test data points. Lastly, we use the theoretical results to derive upper bounds for adaptive optimization algorithms under the transductive learning setting. We also apply them to semi-supervised learning and transductive graph learning scenarios, meanwhile validating the derived bounds by experiments on synthetic and real-world datasets.
Paper Structure (37 sections, 18 theorems, 188 equations, 3 figures, 1 table)

This paper contains 37 sections, 18 theorems, 188 equations, 3 figures, 1 table.

Key Result

Theorem 1

Suppose that $\ell(w,s) \in [0,B]$ holds for all $w \in \mathcal{W}, s \in \{s_i\}_{i=1}^n$, where $B > 0$ is a constant. Also, suppose that $P_{W,Z} \ll P_W P_Z$. Then we have

Figures (3)

  • Figure 1: Estimations of the transductive generalization gap and the derived bounds on MNIST and CIFAR-$10$ with different values of $m$ and $k$.
  • Figure 2: Estimations of the transductive generalization gap and the derived bounds on cSBMs with GAT and GPR-GNN. The first (second) and third (fourth) rows correspond to $\phi=-0.5$ ($\phi=0.5$). The left, middle, and right figures in each row correspond to $k=1$, $k=2$ and $k=3$.
  • Figure 3: Estimations of the transductive generalization gap and the derived bounds on real-world datasets with GAT and GPR-GNN.

Theorems & Definitions (22)

  • Theorem 1
  • Theorem 2
  • Theorem 3: Yaniv2007, Theorem 1
  • Proposition 4
  • Definition 5: Transductive Supersample
  • Proposition 6
  • Theorem 7
  • Corollary 8
  • Definition 9: $k$-Transductive Supersample
  • Theorem 10
  • ...and 12 more