Table of Contents
Fetching ...

Unveiling the Potential of Diffusion Large Language Model in Controllable Generation

Zhen Xiong, Yujun Cai, Zhecheng Li, Yiwei Wang

TL;DR

The paper tackles the difficulty of generating reliable structured outputs with autoregressive LLMs by leveraging diffusion-based LLMs. It introduces Self-Adaptive Schema Scaffolding (S^3), a training-free approach that injects a structural scaffold with adaptive null tokens into the output context to guide denoising and enforce structure. ThroughWikiBio experiments, S^3 achieves superior structural adherence, content fidelity, and faithfulness while maintaining efficiency with fewer denoising steps. The work provides a new perspective on deploying diffusion models for controllable generation tasks and offers practical paths for robust, structured outputs.

Abstract

Controllable generation is a fundamental task in NLP with many applications, providing a basis for function calling to agentic communication. However, even state-of-the-art autoregressive Large Language Models (LLMs) today exhibit unreliability when required to generate structured output. Inspired by the current new diffusion-based large language models (dLLM), we realize that the architectural difference, especially the global information-sharing mechanism for language modeling, may be the key to unlock next-level controllable generation. To explore the possibility, we propose Self-adaptive Schema Scaffolding ($S^3$), a novel framework that enables dLLM to stably generate reliable structured outputs (e.g., JSON) by utilizing its innate reverse reasoning capability and global context awareness. $S^3$ initiates a schematic template directly in the output context as a starting state for dLLM, offering a more robust and general method than intricate prompt optimization. Experiments demonstrate that our method substantially unlocks the dLLM's potential in controllable generation in terms of structure adherence, content fidelity, and faithfulness. These results establish new perspectives and practical pathways for deploying language models in controllable generation tasks.

Unveiling the Potential of Diffusion Large Language Model in Controllable Generation

TL;DR

The paper tackles the difficulty of generating reliable structured outputs with autoregressive LLMs by leveraging diffusion-based LLMs. It introduces Self-Adaptive Schema Scaffolding (S^3), a training-free approach that injects a structural scaffold with adaptive null tokens into the output context to guide denoising and enforce structure. ThroughWikiBio experiments, S^3 achieves superior structural adherence, content fidelity, and faithfulness while maintaining efficiency with fewer denoising steps. The work provides a new perspective on deploying diffusion models for controllable generation tasks and offers practical paths for robust, structured outputs.

Abstract

Controllable generation is a fundamental task in NLP with many applications, providing a basis for function calling to agentic communication. However, even state-of-the-art autoregressive Large Language Models (LLMs) today exhibit unreliability when required to generate structured output. Inspired by the current new diffusion-based large language models (dLLM), we realize that the architectural difference, especially the global information-sharing mechanism for language modeling, may be the key to unlock next-level controllable generation. To explore the possibility, we propose Self-adaptive Schema Scaffolding (), a novel framework that enables dLLM to stably generate reliable structured outputs (e.g., JSON) by utilizing its innate reverse reasoning capability and global context awareness. initiates a schematic template directly in the output context as a starting state for dLLM, offering a more robust and general method than intricate prompt optimization. Experiments demonstrate that our method substantially unlocks the dLLM's potential in controllable generation in terms of structure adherence, content fidelity, and faithfulness. These results establish new perspectives and practical pathways for deploying language models in controllable generation tasks.

Paper Structure

This paper contains 29 sections, 1 theorem, 11 equations, 6 figures, 2 tables.

Key Result

Theorem 4.1

Let $\mathbf{x}_0$ be a target structured sequence and $\mathbf{x}_t$ be a partially masked sequence at timestep $t$ with scaffold $\mathcal{S}$ defining fixed structural positions. For a diffusion language model trained with objective (Eq. eq:dllm-learning), initializing the denoising process with where $\hat{\mathbf{x}}_0$ is generated with scaffolding, $\tilde{\mathbf{x}}_0$ is generated witho

Figures (6)

  • Figure 1: Illustrative comparison between autoregressive and diffusion-based language modeling on tasks requires specific global structure control and token-space planning in advance.
  • Figure 2: The overview of our method's pipeline. We begin by decomposing the original task instruction into two components: a problem description and a set of structural constraints. These constraints are compiled into a schema, which is then used to initialize a noisy scaffold where mask tokens serve as placeholders for missing content. The dLLM completes this scaffold by predicting the masked tokens, using the problem description as context to generate structured outputs. Additionally, we apply a selective remasking strategy that allows the model to iteratively refine its predictions and further improve generation quality.
  • Figure 3: Structural adherence comparison across denoising steps and methods. Results show consistent improvements across all metrics using our schema scaffolding approaches, with near-perfect performance achieved in fewer steps.
  • Figure 4: Content fidelity comparison across denoising steps and methods. Our self-adaptive schema scaffolding consistently achieves the highest precision, recall, and F1 score across all settings.
  • Figure 5: The complete prompt we use for the baseline method.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 4.1: Scaffold-Guided Denoising Convergence
  • proof