Transductive Active Learning: Theory and Applications

Jonas Hübotter; Bhavya Sukhija; Lenart Treven; Yarden As; Andreas Krause

Transductive Active Learning: Theory and Applications

Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause

TL;DR

This work extends active learning to transductive settings where sampling is restricted to a sample space $\mathcal{S}$ while predictions target a broader space $\mathcal{A}$. It develops a principled theory showing convergence of uncertainty about $\boldsymbol{f}_{\mathcal{A}}$ to the irreducible limit using GP/RKHS models, and demonstrates that sampling to minimize posterior uncertainty yields superior sample efficiency compared to traditional uncertainty sampling. The authors instantiate this framework in two practical domains: active fine-tuning of large neural networks and safe Bayesian optimization, achieving state-of-the-art or competitive performance while respecting safety constraints. They also propose batch strategies via conditional embeddings to maintain diversity and relevance during data selection, and provide extensive experiments on MNIST/CIFAR-100 and a quadcopter control problem. Overall, TAL offers a flexible, info-theoretic pathway for directed learning with provable guarantees and strong empirical impact across real-world domains.

Abstract

We study a generalization of classical active learning to real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We demonstrate their strong sample efficiency in two key applications: active fine-tuning of large neural networks and safe Bayesian optimization, where they achieve state-of-the-art performance.

Transductive Active Learning: Theory and Applications

TL;DR

This work extends active learning to transductive settings where sampling is restricted to a sample space

while predictions target a broader space

. It develops a principled theory showing convergence of uncertainty about

to the irreducible limit using GP/RKHS models, and demonstrates that sampling to minimize posterior uncertainty yields superior sample efficiency compared to traditional uncertainty sampling. The authors instantiate this framework in two practical domains: active fine-tuning of large neural networks and safe Bayesian optimization, achieving state-of-the-art or competitive performance while respecting safety constraints. They also propose batch strategies via conditional embeddings to maintain diversity and relevance during data selection, and provide extensive experiments on MNIST/CIFAR-100 and a quadcopter control problem. Overall, TAL offers a flexible, info-theoretic pathway for directed learning with provable guarantees and strong empirical impact across real-world domains.

Abstract

Paper Structure (121 sections, 33 theorems, 144 equations, 25 figures, 3 tables, 2 algorithms)

This paper contains 121 sections, 33 theorems, 144 equations, 25 figures, 3 tables, 2 algorithms.

Introduction
"Directed" transductive active learning
Problem Setting
Background on information theory
Algorithms for Transductive Active Learning
Main Results
Convergence Guarantees
Experiments in the Gaussian Process Setting
How does the smoothness of $f$ affect ITL?
Does TAL outperform uncertainty sampling?
Active Fine-Tuning of Neural Networks
How can we leverage the latent structure learned by the pre-trained model?
Batch selection: Diversity via conditional embeddings
Experiments
Testbeds & architectures
...and 106 more sections

Key Result

Theorem 3.3

Let asm:submodularity hold and the data be selected by either VTL or ITL. Assume that ${f \sim \mathcal{GP}({} \mu, k)}$ with known mean function $\mu$ and kernel $k$, the noise $\varepsilon_{\boldsymbol{x}}$ is mutually independent and zero-mean Gaussian with known variance, and $\gamma_n$ is subli Moreover, if ${\boldsymbol{x} \in \mathcal{A} \cap \mathcal{S}}$, there exists a constant $C'$ such

Figures (25)

Figure 1: Instances of transductive active learning with target space $\mathcal{A}$ shown in blue and sample space $\mathcal{S}$ shown in gray. The points denote plausible observations within $\mathcal{S}$ to "learn" $\mathcal{A}$. In (A), the target space contains "everything" within $\mathcal{S}$ as well as points outside$\mathcal{S}$. In (B, C, D), one makes observations directed towards learning about a particular target. Prior work on inductive active learning has focused on the instance $\mathcal{A} = \mathcal{S}$.
Figure 2: Initial $25$ samples of ITL under a Gaussian kernel with lengthscale $1$(left) and a Laplace kernel with lengthscale $10$(right). Shown in gray is the sample space $\mathcal{S}$ and shown in blue is the target space $\mathcal{A}$. In three of the four examples, points outside the target space provide useful information.
Figure 3: Entropy of $\boldsymbol{f}_{\!\!\mathcal{A}}$ ranging from $-3850$ to $-3725$ and the mean marginal standard deviations of $\boldsymbol{f}_{\!\!\mathcal{A}}$ ranging from $0$ to $0.15$. Experiment is using the Gaussian kernel of the left instance ($\mathcal{A} \subset \mathcal{S}$) from \ref{['fig:gps:directed_advantage']}. It can be seen that ITL and VTL outperform UnSa and Random. Uncertainty bands correspond to one standard error over $10$ random seeds.
Figure 4: Active fine-tuning on MNIST (left) and CIFAR-100 (right). Random selects each observation uniformly at random from $\mathcal{S}$. The batch size is $1$ for MNIST and $10$ for CIFAR-100. Uncertainty bands correspond to one standard error over $10$ random seeds. We see that transductive active learning with ITL and VTL significantly outperforms competing methods, and in particular, retrieves substantially more samples from the support of $\mathcal{P}_{\!\!\mathcal{A}}$. See \ref{['sec:nns_appendix']} for details and ablations.
Figure 5: We compare ITL and VTL to Oracle SafeOpt, which has oracle knowledge of the Lipschitz constants, SafeOpt, where the Lipschitz constants are estimated from the GP, as well as Heuristic SafeOpt and ISE, and observe that ITL and VTL systematically perform well. We compare against additional baselines in \ref{['sec:safe_bo_appendix:thompson_sampling']}. The regret is evaluated with respect to the ground truth objective $f^\star$ and constraint $g^\star$, and averaged over 10 (in synthetic experiments) and 25 (in the quadcopter experiment) random seeds. Additional details can be found in \ref{['sec:safe_bo_appendix:details']}.
...and 20 more figures

Theorems & Definitions (73)

Theorem 3.3: Convergence rates
Theorem 5.1: Convergence to safe optimum
Lemma C.1
proof
Lemma C.2
proof
Definition C.3: Submodularity
Theorem C.4: nemhauser1978analysis
proof
Example C.5
...and 63 more

Transductive Active Learning: Theory and Applications

TL;DR

Abstract

Transductive Active Learning: Theory and Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (73)