Valid Stopping for LLM Generation via Empirical Dynamic Formal Lift

Sanjeda Akter; Ibne Farabi Shihab; Anuj Sharma

Valid Stopping for LLM Generation via Empirical Dynamic Formal Lift

Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma

Abstract

We introduce Sequential-EDFL (Empirical Dynamic Formal Lift), applying anytime-valid sequential testing to language model generation stopping. Our approach tracks information lift -- the log-likelihood ratio between full models and deliberately weakened "skeleton" baselines -- using self-normalized empirical-Bernstein e-processes that provide formal delta-level error control regardless of stopping time. We handle unknown centering through online mean estimation, combine multiple parameters via mixture e-processes, and support adaptive resets under distributional drift. On six benchmarks, Sequential-EDFL reduces generation by 22-28% vs. sequential baselines while maintaining delta-level control with 12% computational overhead. We introduce automated skeletons (distilled submodels, randomized logits) and show robustness across skeleton families. Composing EDFL with a lightweight correctness gate (sentence boundaries + verifier) improves end-task correctness while preserving anytime-valid guarantees by only delaying stopping. Our certificates control information sufficiency, not factual correctness -- 10.9% of stopped sequences remain incorrect even with the gate (13.2-22.7% without it). EDFL serves as a first-stage filter reducing verification burden by 83%, not as a standalone solution for safety-critical domains.

Valid Stopping for LLM Generation via Empirical Dynamic Formal Lift

Abstract

Valid Stopping for LLM Generation via Empirical Dynamic Formal Lift

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)