Table of Contents
Fetching ...

Learning Composable Chains-of-Thought

Fangcong Yin, Zeyu Leo Liu, Liu Leqi, Xi Ye, Greg Durrett

TL;DR

This paper tackles the challenge of compositional generalization in reasoning with large language models by introducing Composable CoT, a data-augmentation scheme that makes atomic chain-of-thought (CoT) traces composable at inference. It shows how to construct Composable CoT data, and how to fuse atomic CoT models either via multitask learning (MTL) or through model merging, with rejection sampling fine-tuning (RFT) used to bootstrap performance when only limited compositional supervision is available. Across string operation tasks and Skill-Mix paradigms, Composable CoT variants outperform standard CoT baselines in zero-shot settings and often exceed baselines that receive compositional labels, especially when budgets are constrained. The work offers a scalable pathway to robust compositional reasoning by reusing simple skills, and suggests directions for scaling to more complex multi-skill compositions.

Abstract

A common approach for teaching large language models (LLMs) to reason is to train on chain-of-thought (CoT) traces of in-distribution reasoning problems, but such annotated data is costly to obtain for every problem of interest. We want reasoning models to generalize beyond their training distribution, and ideally to generalize compositionally: combine atomic reasoning skills to solve harder, unseen reasoning tasks. We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data. We find that simply training models on CoT data of atomic tasks leads to limited generalization, but minimally modifying CoT formats of constituent atomic tasks to be composable can lead to improvements. We can train "atomic CoT" models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task. Such a combined model can be further bootstrapped on a small amount of compositional data using rejection sampling fine-tuning (RFT). Results on string operations and natural language skill compositions show that training LLMs on Composable CoT outperforms multitask learning and continued fine-tuning baselines within a given training data budget.

Learning Composable Chains-of-Thought

TL;DR

This paper tackles the challenge of compositional generalization in reasoning with large language models by introducing Composable CoT, a data-augmentation scheme that makes atomic chain-of-thought (CoT) traces composable at inference. It shows how to construct Composable CoT data, and how to fuse atomic CoT models either via multitask learning (MTL) or through model merging, with rejection sampling fine-tuning (RFT) used to bootstrap performance when only limited compositional supervision is available. Across string operation tasks and Skill-Mix paradigms, Composable CoT variants outperform standard CoT baselines in zero-shot settings and often exceed baselines that receive compositional labels, especially when budgets are constrained. The work offers a scalable pathway to robust compositional reasoning by reusing simple skills, and suggests directions for scaling to more complex multi-skill compositions.

Abstract

A common approach for teaching large language models (LLMs) to reason is to train on chain-of-thought (CoT) traces of in-distribution reasoning problems, but such annotated data is costly to obtain for every problem of interest. We want reasoning models to generalize beyond their training distribution, and ideally to generalize compositionally: combine atomic reasoning skills to solve harder, unseen reasoning tasks. We take a step towards compositional generalization of reasoning skills when addressing a target compositional task that has no labeled CoT data. We find that simply training models on CoT data of atomic tasks leads to limited generalization, but minimally modifying CoT formats of constituent atomic tasks to be composable can lead to improvements. We can train "atomic CoT" models on the atomic tasks with Composable CoT data and combine them with multitask learning or model merging for better zero-shot performance on the target compositional task. Such a combined model can be further bootstrapped on a small amount of compositional data using rejection sampling fine-tuning (RFT). Results on string operations and natural language skill compositions show that training LLMs on Composable CoT outperforms multitask learning and continued fine-tuning baselines within a given training data budget.

Paper Structure

This paper contains 50 sections, 1 equation, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) Composable Chain-of-thought (left): A compositional task involves two separate atomic capabilities. We use a data augmentation scheme to teach LLMs CoT formats that can be combined at inference time to address compositional tasks. (b) Pipeline for learning Composable CoT (right): Models trained on composable CoT data of atomic skills can be combined with multitask learning or model merging for zero-shot compositional generalization, and can be further improved by rejection sampling fine-tuning on limited compositional supervision.
  • Figure 2: Construction of Composable CoT data with $k$ chain-of-thought tags. We insert $k-1$ proxy prefixes at the end of the prompt, before the generation of $\mathbf{t}_k$.
  • Figure 3: Summary of settings for methods evaluated. Names in the results table reference configurations described in this figure; e.g., ComposableCoT-Merge uses ComposableCoTs with model merging, and in the zero-shot setting does not use further tuning.