Experimental Design for Active Transductive Inference in Large Language Models

Subhojyoti Mukherjee; Anusha Lalitha; Aniket Deshmukh; Ge Liu; Yifei Ma; Branislav Kveton

Experimental Design for Active Transductive Inference in Large Language Models

Subhojyoti Mukherjee, Anusha Lalitha, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

TL;DR

This work addresses adaptive prompt design for large language models by framing it as active learning to select informative in-context demonstrations. It introduces two principled methods, GO (G-Optimal design) and SAL (Simulation-Based Active Learning), which aim to minimize test-set uncertainty by selectively labeling informative examples; GO uses posterior covariance in a linear-model surrogate, while SAL uses LLM-driven simulations and converges to GO in the linear regime as the simulation budget grows. The authors provide monotonicity and equivalence analyses (GO vs SAL) and demonstrate strong empirical gains across classification, regression, ARC, PCFG, and natural-language tasks using multiple LLMs, highlighting goes beyond heuristic prompt selection. The approach offers a practical, model-agnostic avenue for efficient prompt design with potential extensions to multi-modal and diffusion-model contexts, supporting more reliable and sample-efficient in-context learning.

Abstract

One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD). We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. The training examples are initially unlabeled and we obtain the label of the most informative ones, which maximally reduces uncertainty in the LLM prediction. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen. We analyze these algorithms in linear models: first GO and then use its equivalence with SAL. We experiment with many different tasks in small, medium-sized, and large language models; and show that GO and SAL outperform other methods for choosing few-shot examples in the LLM prompt at inference time.

Experimental Design for Active Transductive Inference in Large Language Models

TL;DR

Abstract

Paper Structure (23 sections, 9 theorems, 31 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 23 sections, 9 theorems, 31 equations, 5 figures, 5 tables, 2 algorithms.

Introduction
Setting
Algorithms
Optimal Design Algorithm
Simulation-Based Algorithm
Analysis
Analysis of GO
Analysis of SAL
Experiments
Experimental Setup
Conclusions
Related Work
Proofs
Properties of our objective
Proof of \ref{['lem:monotonicity']}
...and 8 more sections

Key Result

Theorem 1

Let $\alpha_{\min}, \beta_{\max} \geq 0$ be set such that $\beta_{\max} \geq 1 - \alpha_{\min}^2$ and $T \leq \tfrac{\alpha_{\min}^2}{(\beta_{\max} + \sqrt{2}) \beta_{\max} d}$. Then for any $\mathbf{x}_{*,k}$ we can show that $\mathbf{x}_{*,k}^\top \widehat{\mathbf{\Sigma}}_{T + 1} \mathbf{x}_{*,k}

Figures (5)

Figure 1: Explanation of ARC tasks
Figure 2: Explanation of PCFG task.
Figure 3: Prompt examples for Classification, Regression, Movie, and Prompt
Figure 4: Prompt examples for Theme and PCFG tasks
Figure 5: Prompt examples for ARC task

Theorems & Definitions (16)

Theorem 1
Theorem 2
Lemma 3
Definition 4: Supermodular function
Lemma 5
proof
Lemma 6
proof
Lemma 7
proof
...and 6 more

Experimental Design for Active Transductive Inference in Large Language Models

TL;DR

Abstract

Experimental Design for Active Transductive Inference in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (16)