Information-Theoretic Generalization Bounds for Transductive Learning and its Applications
Huayi Tang, Yong Liu
TL;DR
The paper develops a comprehensive information-theoretic and PAC-Bayesian framework for transductive learning, addressing both random splitting and random sampling, and introduces transductive supersamples to bridge inductive results to transduction. It derives mutual information and conditional mutual information bounds, along with transductive PAC-Bayes bounds, under relaxed loss and data-count assumptions, and extends to adaptive optimization scenarios. The results are applied to semi-supervised learning and transductive graph learning, with empirical validation showing non-vacuous bounds that track the generalization gap as labeled data increases. Together, these contributions provide principled, data- and algorithm-dependent generalization guarantees for transductive models, including GNNs, in practical setups. The work has implications for understanding generalization in complex, label-efficient learning settings and informs the design of transductive strategies in real-world systems.
Abstract
In this paper, we establish generalization bounds for transductive learning algorithms in the context of information theory and PAC-Bayes, covering both the random sampling and the random splitting setting. First, we show that the transductive generalization gap can be controlled by the mutual information between training label selection and the hypothesis. Next, we propose the concept of transductive supersample and use it to derive transductive information-theoretic bounds involving conditional mutual information and different information measures. We further establish transductive PAC-Bayesian bounds with weaker assumptions on the type of loss function and the number of training and test data points. Lastly, we use the theoretical results to derive upper bounds for adaptive optimization algorithms under the transductive learning setting. We also apply them to semi-supervised learning and transductive graph learning scenarios, meanwhile validating the derived bounds by experiments on synthetic and real-world datasets.
