Table of Contents
Fetching ...

Decoupling Task-Solving and Output Formatting in LLM Generation

Haikang Deng, Po-Nien Kung, Nanyun Peng

TL;DR

This work tackles the problem that complex prompts entangling task instructions with strict output formats hinder LLM performance. It introduces Deco-G, a decoding framework that decouples task solving from format adherence by delegating formatting to a Format Estimation Module (FEM) while the LLM focuses on the task; it relies on an instruction-aware HMM distillation, a flexible trie-based DFA for complex templates, and HMM hidden-state pruning for efficiency. The main contributions are threefold: (i) instruction-aware HMM distillation to capture task-oriented behavior, (ii) a flexible trie DFA to model sophisticated format constraints, and (iii) HMM pruning to enable scalable inference. Across GSM8k, SummEval, and ACE05, Deco-G yields 1.0%–6.0% relative improvements with guaranteed format compliance and often stronger alignment to human judgments, demonstrating practical gains in both reasoning tasks and format-sensitive generation. The approach enables robust, format-aware decoding that preserves task-solving capabilities while ensuring reliable integration with downstream systems.

Abstract

Large language models (LLMs) are increasingly adept at following instructions containing task descriptions to solve complex problems, such as mathematical reasoning and automatic evaluation (LLM-as-a-Judge). However, as prompts grow more complex, models often struggle to adhere to all instructions. This difficulty is especially common when instructive prompts intertwine reasoning directives -- specifying what the model should solve -- with rigid formatting requirements that dictate how the solution must be presented. The entanglement creates competing goals for the model, suggesting that more explicit separation of these two aspects could lead to improved performance. To this front, we introduce Deco-G, a decoding framework that explicitly decouples format adherence from task solving. Deco-G handles format compliance with a separate tractable probabilistic model (TPM), while prompts LLMs with only task instructions. At each decoding step, Deco-G combines next token probabilities from the LLM with the TPM calculated format compliance likelihood to form the output probability. To make this approach both practical and scalable for modern instruction-tuned LLMs, we introduce three key innovations: instruction-aware distillation, a flexible trie-building algorithm, and HMM state pruning for computational efficiency. We demonstrate the effectiveness of Deco-G across a wide range of tasks with diverse format requirements, including mathematical reasoning, LLM-as-a-judge, and event argument extraction. Overall, our approach yields 1.0% to 6.0% relative gain over regular prompting practice with guaranteed format compliance.

Decoupling Task-Solving and Output Formatting in LLM Generation

TL;DR

This work tackles the problem that complex prompts entangling task instructions with strict output formats hinder LLM performance. It introduces Deco-G, a decoding framework that decouples task solving from format adherence by delegating formatting to a Format Estimation Module (FEM) while the LLM focuses on the task; it relies on an instruction-aware HMM distillation, a flexible trie-based DFA for complex templates, and HMM hidden-state pruning for efficiency. The main contributions are threefold: (i) instruction-aware HMM distillation to capture task-oriented behavior, (ii) a flexible trie DFA to model sophisticated format constraints, and (iii) HMM pruning to enable scalable inference. Across GSM8k, SummEval, and ACE05, Deco-G yields 1.0%–6.0% relative improvements with guaranteed format compliance and often stronger alignment to human judgments, demonstrating practical gains in both reasoning tasks and format-sensitive generation. The approach enables robust, format-aware decoding that preserves task-solving capabilities while ensuring reliable integration with downstream systems.

Abstract

Large language models (LLMs) are increasingly adept at following instructions containing task descriptions to solve complex problems, such as mathematical reasoning and automatic evaluation (LLM-as-a-Judge). However, as prompts grow more complex, models often struggle to adhere to all instructions. This difficulty is especially common when instructive prompts intertwine reasoning directives -- specifying what the model should solve -- with rigid formatting requirements that dictate how the solution must be presented. The entanglement creates competing goals for the model, suggesting that more explicit separation of these two aspects could lead to improved performance. To this front, we introduce Deco-G, a decoding framework that explicitly decouples format adherence from task solving. Deco-G handles format compliance with a separate tractable probabilistic model (TPM), while prompts LLMs with only task instructions. At each decoding step, Deco-G combines next token probabilities from the LLM with the TPM calculated format compliance likelihood to form the output probability. To make this approach both practical and scalable for modern instruction-tuned LLMs, we introduce three key innovations: instruction-aware distillation, a flexible trie-building algorithm, and HMM state pruning for computational efficiency. We demonstrate the effectiveness of Deco-G across a wide range of tasks with diverse format requirements, including mathematical reasoning, LLM-as-a-judge, and event argument extraction. Overall, our approach yields 1.0% to 6.0% relative gain over regular prompting practice with guaranteed format compliance.

Paper Structure

This paper contains 40 sections, 11 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 2: Deco-G decouples task and format---prompting LLM with task-only information and sending format constraints to FEM. Deco-G decodes from the posterior constructed by multiplying LLM token probabilities with FEM estimated satisfaction rate.
  • Figure 3: Deco-G steers Llama to generate predefined template "The final answer is ..." by boosting probabilities of template tokens.
  • Figure 4: LLM's token-level entropy for different models and methods. Llama has a more flexible token distribution as compared to Qwen.
  • Figure 5: Average retention rate (of total mass) over $\mathop{\mathrm{top-k}}\nolimits$ HMM hidden states on GSM8k dataset.