Table of Contents
Fetching ...

The Role of Generator Access in Autoregressive Post-Training

Amit Kiran Rege

Abstract

We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top-$k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top-$1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

The Role of Generator Access in Autoregressive Post-Training

Abstract

We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top- reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top- access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

Paper Structure

This paper contains 33 sections, 38 theorems, 222 equations, 1 table, 3 algorithms.

Key Result

Theorem 1

Fix a prompt $x$. An experiment is no-reset if and only if it is a randomized post-processing of $\mathsf{PathFull}_x$. In the usual comparison-of-experiments sense, $\mathsf{PathFull}_x$ is therefore the maximal no-reset experiment. $\blacktriangleleft$$\blacktriangleleft$

Theorems & Definitions (73)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Definition 3
  • Theorem 2
  • Corollary 1
  • Definition 4
  • Theorem 3
  • Theorem 4
  • Definition 5
  • ...and 63 more