Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace; Christian Belardi; Sofian Zalouk; Adhitya Polavaram; Srivatsa Kundurthy; Kilian Q. Weinberger

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian Q. Weinberger

TL;DR

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation with straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Abstract

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

TL;DR

Abstract

win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.

Paper Structure (28 sections, 18 equations, 16 figures, 4 tables)

This paper contains 28 sections, 18 equations, 16 figures, 4 tables.

Introduction
Background
Stop-Think-AutoRegress Language Diffusion Model
Training Procedure
Generation Process
Impact of Diffusion Process on Language Modeling
Natural Language Understanding
StoryCloze Generation
Language Generation Evaluation
Plug-and-Play Control
Related Work
Conclusion
Additional Visualizations of the Impact of Diffusion Process on Language Modeling
Inference Latency
NLU Derivation
...and 13 more sections

Figures (16)

Figure 1: Unified diffusion-guided language model architecture illustrating the training (left) and generation (right) processes. See text for details.
Figure 2: Visualization of the change in log likelihood due to conditioning on the clean continuation embedding versus pure noise. Tokens highlighted in red represent the information provided by the semantic plan.
Figure 3: Diagram comparing zero-shot NLU evaluation for standard LMs (left) and STAR-LDM (right). Both models score candidate answers ($A_i$) given a question ($Q$). STAR-LDM utilizes both the answer text $A_i$ and its latent embedding $\mathbf{z}_{i,0}$ from Sentence-T5 for scoring.
Figure 4: LLM-Judge evaluation. Results presented with 95% confidence intervals.
Figure 5: Relationship between perplexity and content attributes across guidance scales ($s$).
...and 11 more figures

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

TL;DR

Abstract

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (16)