DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

Sidi Lu; Wenbo Zhao; Chenyang Tao; Arpit Gupta; Shanchan Wu; Tagyoung Chung; Nanyun Peng

DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

Sidi Lu, Wenbo Zhao, Chenyang Tao, Arpit Gupta, Shanchan Wu, Tagyoung Chung, Nanyun Peng

TL;DR

DiNADO tackles the limitations of NeurAlly-Decomposed Oracles (NADO) for controllable generation by disentangling the global norm from the step-wise $R$-value, stabilizing training and increasing model capacity. It introduces a family of variants (DiNADO-Hard, DiNADO-Soft, DiNADO-Merge) and leverages LoRA for scalable fine-tuning, along with likelihood-based importance sampling to improve gradient efficiency. The approach yields state-of-the-art performance on lexical-constrained generation (CommonGen) and formality control in machine translation (FormalMT), while reducing sample complexity and enabling effective post-hoc control without heavy parameter growth. This has practical implications for robust, scalable, and interpretable controllable generation in real-world LLM deployment with limited fine-tuning budgets.

Abstract

NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers from gradient vanishing for low-probability control signals and is highly reliant on a regularization to satisfy the stochastic version of Bellman equation. In addition, the vanilla implementation of NADO introduces a few additional transformer layers, suffering from a limited capacity especially compared to other finetune-based model adaptation methods like LoRA. In this paper, we propose a improved version of the NADO algorithm, namely DiNADO (norm-Disentangled NeurAlly-Decomposed Oracles), which improves the performance of the NADO algorithm through disentangling the step-wise global norm over the approximated oracle $R$-value for all potential next-tokens, allowing DiNADO to be combined with finetuning methods like LoRA. We discuss in depth how DiNADO achieves better capacity, stability and flexibility with both empirical and theoretical results. Experiments on formality control in machine translation and the lexically constrained generation task CommonGen demonstrates the significance of the improvements.

DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

TL;DR

DiNADO tackles the limitations of NeurAlly-Decomposed Oracles (NADO) for controllable generation by disentangling the global norm from the step-wise

-value, stabilizing training and increasing model capacity. It introduces a family of variants (DiNADO-Hard, DiNADO-Soft, DiNADO-Merge) and leverages LoRA for scalable fine-tuning, along with likelihood-based importance sampling to improve gradient efficiency. The approach yields state-of-the-art performance on lexical-constrained generation (CommonGen) and formality control in machine translation (FormalMT), while reducing sample complexity and enabling effective post-hoc control without heavy parameter growth. This has practical implications for robust, scalable, and interpretable controllable generation in real-world LLM deployment with limited fine-tuning budgets.

Abstract

-value for all potential next-tokens, allowing DiNADO to be combined with finetuning methods like LoRA. We discuss in depth how DiNADO achieves better capacity, stability and flexibility with both empirical and theoretical results. Experiments on formality control in machine translation and the lexically constrained generation task CommonGen demonstrates the significance of the improvements.

Paper Structure (34 sections, 20 equations, 2 figures, 4 tables)

This paper contains 34 sections, 20 equations, 2 figures, 4 tables.

Introduction
Background and Related Work
Controllable Generation for Autoregressive Models.
NeurAlly-Decomposed Oracle (NADO)
The GeLaTo Algorithm
Importance Sampling
Methodology
Notations
General formulation
Formulation in NADO Modules
Normalization Yields Uniqueness of Optima in SFT for NADO-altered likelihood
DiNADO: Disentangling the rescaler from the step-wise oracle factorization value $R$
DiNADO-Hard: Towards Regularization-Free Training of NADO
Equation \ref{['eq:forward_consis']} (the Forward Consistency Condition)
DiNADO-Soft: Balancing between Proper Regularization of $R$ and Better Approximation of $C(\mathbf{x}, \mathbf{y})$
...and 19 more sections

Figures (2)

Figure 1: Illustration of the original example distribution, the truncated distribution using the likelihood re-weighting trick and directly approximating the distribution empirically using the same number of random samples.
Figure 2: (a) The original NADO and DiNADO converge with similar dynamics at early stages in terms of the major part of loss, but DiNADO converges to a better local optima. (b) DiNADO's norm disentanglement significantly helps to stabilize the regularization term.

DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

TL;DR

Abstract

DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)