Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models
Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang
TL;DR
This work introduces Chain-of-Instructions (CoI), a framework for compositional instruction tuning that trains LLMs to execute multi-subtask prompts step-by-step. By automatically generating CoI data from a large corpus of single-instruction tasks and validating composability with LLMs, the authors create CoI_2, CoI_3, and longer chains, enabling robust evaluation of instruction-following in multi-step scenarios. Fine-tuning Alpaca-7B and Mistral-7B-Instruct on CoI data yields improvements over baselines on in-domain composite tasks, and demonstrates transfer to unseen single tasks, longer chains, and a multilingual downstream task. The results highlight the value of training on compositional instructions to improve generalization and reliability in complex prompts, with future work on deeper instruction decomposition and broader task coverage. Overall, simple CoI-tuning provides consistent gains for handling longer, unseen instruction chains and downstream language tasks with practical implications for scalable instruction-following in LLMs.
Abstract
Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instructions called chain-of-instructions (CoI), where the output of one instruction becomes an input for the next like a chain. Unlike the conventional practice of solving single instruction tasks, our proposed method encourages a model to solve each subtask step by step until the final answer is reached. CoI-tuning (i.e., fine-tuning with CoI instructions) improves the model's ability to handle instructions composed of multiple subtasks as well as unseen composite tasks such as multilingual summarization. Overall, our study find that simple CoI tuning of existing instruction data can provide consistent generalization to solve more complex, unseen, and longer chains of instructions.
