Table of Contents
Fetching ...

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian Q. Weinberger

TL;DR

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation with straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Abstract

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

TL;DR

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation with straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Abstract

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.
Paper Structure (28 sections, 18 equations, 16 figures, 4 tables)

This paper contains 28 sections, 18 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Unified diffusion-guided language model architecture illustrating the training (left) and generation (right) processes. See text for details.
  • Figure 2: Visualization of the change in log likelihood due to conditioning on the clean continuation embedding versus pure noise. Tokens highlighted in red represent the information provided by the semantic plan.
  • Figure 3: Diagram comparing zero-shot NLU evaluation for standard LMs (left) and STAR-LDM (right). Both models score candidate answers ($A_i$) given a question ($Q$). STAR-LDM utilizes both the answer text $A_i$ and its latent embedding $\mathbf{z}_{i,0}$ from Sentence-T5 for scoring.
  • Figure 4: LLM-Judge evaluation. Results presented with 95% confidence intervals.
  • Figure 5: Relationship between perplexity and content attributes across guidance scales ($s$).
  • ...and 11 more figures