From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models
Qianyu He, Jie Zeng, Qianxi He, Jiaqing Liang, Yanghua Xiao
TL;DR
The paper tackles the challenge of enabling LLMs to follow complex, multi-constraint instructions by adopting a data-centric framework. It demonstrates that compositional training data (with 3–5 constraints) improves instruction following more than atomic data, especially for smaller models and lower-complexity tasks. A discrimination-based data collection method generates high-quality compositional data, and a contrastive training objective combining Direct Preference Optimization (DPO) with a supervised loss leverages positive and negative samples to boost performance while preserving general ability. Extensive experiments across in-domain, out-of-domain, and adversarial settings show robust improvements and improved training efficiency, underscoring the practical impact of data quality and training strategy on complex instruction following.
Abstract
It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance and training efficiency. We also demonstrate that our methods improve models' ability to follow instructions generally and generalize effectively across out-of-domain, in-domain, and adversarial settings, while maintaining general capabilities.
