Table of Contents
Fetching ...

Approximately Aligned Decoding

Daniel Melcer, Sujan Gonugondla, Pramuditha Perera, Haifeng Qian, Wen-Hao Chiang, Yanjun Wang, Nihal Jain, Pranav Garg, Xiaofei Ma, Anoop Deoras

TL;DR

The paper tackles the problem of enforcing task-specific constraints in autoregressive text without retraining large models. It introduces Approximately Aligned Decoding (AprAD), a backtracking mechanism inspired by speculative decoding that reuses most of the prior prefix after a constraint violation. Positioned between constrained generation and posterior-estimation approaches, AprAD aims to preserve the output distribution while keeping inference costs manageable. Empirical results across simulated, lipogram, and code-generation tasks show AprAD delivering task-appropriate performance with substantially lower overhead than ASAp and reduced distribution distortion compared to plain constrained generation.

Abstract

It is common to reject undesired outputs of Large Language Models (LLMs); however, current methods to do so require an excessive amount of computation to re-sample after a rejection, or distort the distribution of outputs by constraining the output to highly improbable tokens. We present a method, Approximately Aligned Decoding (AprAD), to balance the distortion of the output distribution with computational efficiency, inspired by algorithms from the speculative decoding literature. AprAD allows for the generation of long sequences of text with difficult-to-satisfy constraints, while amplifying low probability outputs much less compared to existing methods. We show through a series of experiments that the task-specific performance of AprAD is comparable to methods that do not distort the output distribution, while being much more computationally efficient.

Approximately Aligned Decoding

TL;DR

The paper tackles the problem of enforcing task-specific constraints in autoregressive text without retraining large models. It introduces Approximately Aligned Decoding (AprAD), a backtracking mechanism inspired by speculative decoding that reuses most of the prior prefix after a constraint violation. Positioned between constrained generation and posterior-estimation approaches, AprAD aims to preserve the output distribution while keeping inference costs manageable. Empirical results across simulated, lipogram, and code-generation tasks show AprAD delivering task-appropriate performance with substantially lower overhead than ASAp and reduced distribution distortion compared to plain constrained generation.

Abstract

It is common to reject undesired outputs of Large Language Models (LLMs); however, current methods to do so require an excessive amount of computation to re-sample after a rejection, or distort the distribution of outputs by constraining the output to highly improbable tokens. We present a method, Approximately Aligned Decoding (AprAD), to balance the distortion of the output distribution with computational efficiency, inspired by algorithms from the speculative decoding literature. AprAD allows for the generation of long sequences of text with difficult-to-satisfy constraints, while amplifying low probability outputs much less compared to existing methods. We show through a series of experiments that the task-specific performance of AprAD is comparable to methods that do not distort the output distribution, while being much more computationally efficient.
Paper Structure (35 sections, 5 equations, 6 figures, 5 tables, 7 algorithms)

This paper contains 35 sections, 5 equations, 6 figures, 5 tables, 7 algorithms.

Figures (6)

  • Figure 1: The entire probability mass of AA is shifted to AB.
  • Figure 2: The probability mass of AA is distributed evenly.
  • Figure 3: An accurate posterior estimator corrects the probabilities before sampling.
  • Figure 4: AprAD acts as a midpoint between constrained decoding and ASAp.
  • Figure 5: Representative generation samples for all four methods, using Mistral-7B-Instruct-v0.2. Appearance of the banned letter is bolded, and non-ASCII characters (all Cyrillic in this example) are colored red and underlined. Full samples are provided in Appendix \ref{['section:more_lipogram_examples']}.
  • ...and 1 more figures