Table of Contents
Fetching ...

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

TL;DR

This paper introduces constraint back-translation to improve complex instruction following in large language models by extracting implicit constraints from existing high-quality datasets to form Crab, a large-scale complex instruction-following corpus. A reverse training objective is proposed to teach models to generate constraints from (instruction, response) pairs, and the training pipeline combines forward and reverse objectives with data from Crab and ShareGPT, followed by DPO fine-tuning. Empirical results show that Crab substantially enhances complex instruction-following capabilities on open-source backbones like Mistral-7B and Llama3-8B, outperforming several baselines on IFEval and FollowBench and yielding notable gains in general instruction-following on AlpacaEval. The work discusses design choices, ablations, and constraint-category effects, highlighting both the benefits and limitations (notably style-constraint diversity) and signaling potential for broader adoption and further improvement in automatic data generation for constrained instructions.

Abstract

Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

TL;DR

This paper introduces constraint back-translation to improve complex instruction following in large language models by extracting implicit constraints from existing high-quality datasets to form Crab, a large-scale complex instruction-following corpus. A reverse training objective is proposed to teach models to generate constraints from (instruction, response) pairs, and the training pipeline combines forward and reverse objectives with data from Crab and ShareGPT, followed by DPO fine-tuning. Empirical results show that Crab substantially enhances complex instruction-following capabilities on open-source backbones like Mistral-7B and Llama3-8B, outperforming several baselines on IFEval and FollowBench and yielding notable gains in general instruction-following on AlpacaEval. The work discusses design choices, ablations, and constraint-category effects, highlighting both the benefits and limitations (notably style-constraint diversity) and signaling potential for broader adoption and further improvement in automatic data generation for constrained instructions.

Abstract

Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.

Paper Structure

This paper contains 33 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Existing datasets inherently include implicit satisfied complex constraints in the responses.
  • Figure 2: The framework of constructing the proposed alignment training dataset.
  • Figure 3: An example of responses generated with and without constraints by Llama3-70B-Instruct. The evaluator is gpt-4o-0806. For better visualization, we present only a subset of the responses generated without constraints.
  • Figure 4: Full-mark rates (%) of the responses generated with and without constraints. The evaluator is gpt-4o-0806, focusing on four widely-used dimensions: Engagingness (Eng.), Understandability (Und.), Fluency (Flu.), and Coherence (Coh.).
  • Figure 5: Experimental results on different categories of constraints in the FollowBench of MistralCrab and ConiferSFT.
  • ...and 2 more figures