Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

Shirley Anugrah Hayati; Taehee Jung; Tristan Bodding-Long; Sudipta Kar; Abhinav Sethy; Joo-Kyung Kim; Dongyeop Kang

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang

TL;DR

This work introduces Chain-of-Instructions (CoI), a framework for compositional instruction tuning that trains LLMs to execute multi-subtask prompts step-by-step. By automatically generating CoI data from a large corpus of single-instruction tasks and validating composability with LLMs, the authors create CoI_2, CoI_3, and longer chains, enabling robust evaluation of instruction-following in multi-step scenarios. Fine-tuning Alpaca-7B and Mistral-7B-Instruct on CoI data yields improvements over baselines on in-domain composite tasks, and demonstrates transfer to unseen single tasks, longer chains, and a multilingual downstream task. The results highlight the value of training on compositional instructions to improve generalization and reliability in complex prompts, with future work on deeper instruction decomposition and broader task coverage. Overall, simple CoI-tuning provides consistent gains for handling longer, unseen instruction chains and downstream language tasks with practical implications for scalable instruction-following in LLMs.

Abstract

Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instructions called chain-of-instructions (CoI), where the output of one instruction becomes an input for the next like a chain. Unlike the conventional practice of solving single instruction tasks, our proposed method encourages a model to solve each subtask step by step until the final answer is reached. CoI-tuning (i.e., fine-tuning with CoI instructions) improves the model's ability to handle instructions composed of multiple subtasks as well as unseen composite tasks such as multilingual summarization. Overall, our study find that simple CoI tuning of existing instruction data can provide consistent generalization to solve more complex, unseen, and longer chains of instructions.

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

TL;DR

Abstract

Paper Structure (45 sections, 7 figures, 11 tables)

This paper contains 45 sections, 7 figures, 11 tables.

Introduction
Chain-of-Instructions
Formulation
Automatic Dataset Creation Pipeline
Seed Datatsets
Instruction Composition
Step 1: Single instruction summarization
Step 2: Composability check
CoI Dataset
Experiment Setup
CoI models
Baselines
Metrics
Test sets
Results
...and 30 more sections

Figures (7)

Figure 1: Chain-of-Instructions (CoI) example. The summarization output can be an input for a title generation subtask; the output of the title generation can be an input for style transfer or translation subtasks. Arrow thickness denotes the probability of instruction composability. X means that these subtasks cannot be composed due to format mismatch. $I_{k}$ is $k^{th}$ instruction and $O_{k}$ is $k^{th}$ output.
Figure 2: An example of the Chain-of-Instructions task. The last output is the expected output of the CoI.
Figure 3: Data creation for CoI$_{2}$. We use an LLM for both instruction summarization and composability check. The right column shows an example instance of our chain-of-instruction dataset. Output 1 in Step 2 comes from the original SupNatInst data.
Figure 4: T-SNE of sentence embeddings for most frequent compositional instructions with CoI$_2$.
Figure 5: Human evaluation results. "Prefer CoI" refers to the percentage of CoI outputs preferred by humans; "none" refers to when humans think the outputs for both first and second subtasks are incorrect.
...and 2 more figures

Theorems & Definitions (1)

Definition 1: Chain of Instructions

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

TL;DR

Abstract

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (1)