Table of Contents
Fetching ...

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Pengxiang Li, Shilin Yan, Joey Tsai, Renrui Zhang, Ruichuan An, Ziyu Guo, Xiaowei Gao

TL;DR

This work tackles the suboptimality of static unconditional inputs in classifier-free guidance for iterative masked diffusion language models. It introduces Adaptive Classifier-Free Guidance (A-CFG), which constructs a dynamic unconditional input by remasking low-confidence tokens at each denoising step based on the model's own predictive confidence. Integrating A-CFG into LLaDA and Dream-7B yields substantial gains on complex reasoning and planning benchmarks, such as GPQA and Sudoku, and narrows the gap with autoregressive models. Overall, the results demonstrate that guiding diffusion-based text generation by adapting to real-time uncertainty can significantly improve controllability and output quality in discrete diffusion settings.

Abstract

Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs a static unconditional input, which can be suboptimal for iterative generation processes where model uncertainty varies dynamically. We introduce Adaptive Classifier-Free Guidance (A-CFG), a novel method that tailors the unconditional input by leveraging the model's instantaneous predictive confidence. At each step of an iterative (masked) diffusion language model, A-CFG identifies tokens in the currently generated sequence for which the model exhibits low confidence. These tokens are temporarily re-masked to create a dynamic, localized unconditional input. This focuses CFG's corrective influence precisely on areas of ambiguity, leading to more effective guidance. We integrate A-CFG into a state-of-the-art masked diffusion language model and demonstrate its efficacy. Experiments on diverse language generation benchmarks show that A-CFG yields substantial improvements over standard CFG, achieving, for instance, a 3.9 point gain on GPQA. Our work highlights the benefit of dynamically adapting guidance mechanisms to model uncertainty in iterative generation.

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

TL;DR

This work tackles the suboptimality of static unconditional inputs in classifier-free guidance for iterative masked diffusion language models. It introduces Adaptive Classifier-Free Guidance (A-CFG), which constructs a dynamic unconditional input by remasking low-confidence tokens at each denoising step based on the model's own predictive confidence. Integrating A-CFG into LLaDA and Dream-7B yields substantial gains on complex reasoning and planning benchmarks, such as GPQA and Sudoku, and narrows the gap with autoregressive models. Overall, the results demonstrate that guiding diffusion-based text generation by adapting to real-time uncertainty can significantly improve controllability and output quality in discrete diffusion settings.

Abstract

Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs a static unconditional input, which can be suboptimal for iterative generation processes where model uncertainty varies dynamically. We introduce Adaptive Classifier-Free Guidance (A-CFG), a novel method that tailors the unconditional input by leveraging the model's instantaneous predictive confidence. At each step of an iterative (masked) diffusion language model, A-CFG identifies tokens in the currently generated sequence for which the model exhibits low confidence. These tokens are temporarily re-masked to create a dynamic, localized unconditional input. This focuses CFG's corrective influence precisely on areas of ambiguity, leading to more effective guidance. We integrate A-CFG into a state-of-the-art masked diffusion language model and demonstrate its efficacy. Experiments on diverse language generation benchmarks show that A-CFG yields substantial improvements over standard CFG, achieving, for instance, a 3.9 point gain on GPQA. Our work highlights the benefit of dynamically adapting guidance mechanisms to model uncertainty in iterative generation.

Paper Structure

This paper contains 25 sections, 6 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of model confidence dynamics during iterative generation. (a) Token-level confidence heatmap across token positions and generation steps (darker shades indicate higher confidence). (b) Average and minimum confidence scores per generation step. This visualization highlights the dynamic and non-uniform nature of model confidence that A-CFG aims to leverage.
  • Figure 2: Overview of (left) standard Null Prompt Classifier-Free Guidance and (right) our proposed Adaptive Classifier-Free Guidance (A-CFG) at a single generation step $k$. In standard CFG, the unconditional input often involves masking the entire prompt or using a null prompt. In A-CFG, after computing conditional logits from $\mathbf{x}^{(k)}$, token-level confidences for all non-[MASK] tokens in $\mathbf{x}^{(k)}$ are assessed. Tokens with low confidence (orange/red in illustration) are temporarily re-masked to [MASK] to create the dynamic unconditional input $\mathbf{x}_{\text{uncond}}^{(k)}$. This allows the CFG mechanism to focus guidance on areas of model uncertainty within the current sequence.