Table of Contents
Fetching ...

Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic

Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera

TL;DR

This paper tackles the problem that the order of chain-of-thought steps in autoregressive Transformers significantly affects learning efficiency on arithmetic tasks. It formalizes learning-friendly orders using a memoryless cost model and proposes a loss-profiling, two-stage global-local search to automatically discover effective output orders. The approach is validated on six order-sensitive arithmetic tasks and Prod, showing the ability to identify learning-friendly orders from billions of permutations and even rediscover the known reverse-digit order for multiplication, with near-perfect success for several tasks at moderate lengths. The work suggests a principled way to design input-output ordering—beyond model architecture or prompting—to improve reasoning tasks and points toward broader applications to natural language and symbolic computation.

Abstract

The chain of thought, i.e., step-by-step reasoning, is one of the fundamental mechanisms of Transformers. While the design of intermediate reasoning steps has been extensively studied and shown to critically influence performance on mathematical, multi-step reasoning tasks, the ordering of these steps has received little attention, despite its significant effect on the difficulty of reasoning. This study addresses a novel task of unraveling the chain of thought -- reordering decoder input tokens into a learning-friendly sequence for Transformers, for learning arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially in sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on seven order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, it recovered the reverse-digit order reported in prior studies for the multiplication task.

Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic

TL;DR

This paper tackles the problem that the order of chain-of-thought steps in autoregressive Transformers significantly affects learning efficiency on arithmetic tasks. It formalizes learning-friendly orders using a memoryless cost model and proposes a loss-profiling, two-stage global-local search to automatically discover effective output orders. The approach is validated on six order-sensitive arithmetic tasks and Prod, showing the ability to identify learning-friendly orders from billions of permutations and even rediscover the known reverse-digit order for multiplication, with near-perfect success for several tasks at moderate lengths. The work suggests a principled way to design input-output ordering—beyond model architecture or prompting—to improve reasoning tasks and points toward broader applications to natural language and symbolic computation.

Abstract

The chain of thought, i.e., step-by-step reasoning, is one of the fundamental mechanisms of Transformers. While the design of intermediate reasoning steps has been extensively studied and shown to critically influence performance on mathematical, multi-step reasoning tasks, the ordering of these steps has received little attention, despite its significant effect on the difficulty of reasoning. This study addresses a novel task of unraveling the chain of thought -- reordering decoder input tokens into a learning-friendly sequence for Transformers, for learning arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially in sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on seven order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, it recovered the reverse-digit order reported in prior studies for the multiplication task.

Paper Structure

This paper contains 30 sections, 1 theorem, 24 equations, 11 figures, 18 tables.

Key Result

Proposition 4.2

The tasks ReLU, Square, MLP, Sine, Cubic, and Triangle are order-sensitive.

Figures (11)

  • Figure 1: Success rates for the multiplication of two integers, reproducing Order. Matrix rows and columns indicate the number of digits in each operand. Evaluation is conducted with 100 samples for each digit position. (a) The model is trained to output from the most significant digit. (b) The model is trained to output from the least significant digit.
  • Figure 2: (a) Training-loss curves for a vanilla Transformer (blue) and for a model trained with soft-permutation optimization (red). (b) Permutation matrix learned during permutation training. Sparse off-diagonal weights clustered around the main diagonal indicate leakage from future tokens.
  • Figure 3: Evaluation loss curves when trained with two different orders.
  • Figure 4: Search flow of our hierarchical approach. Global stage: The proposed method generates $T$ candidate permutations by swapping the sequence at the macro-level, exchanging token blocks to quickly spot coarse, learning-friendly orders. Local stage: inside each chosen block, the proposed method further permutes the tokens, refining the sequence to discover a final permutation that maximizes learning ease.
  • Figure 5: Top-1 identification accuracy of loss profiling: (a) fixed-length targets; (b) variable-length targets. Each run is initialized with one of the permutation sets Random ${\mathcal{P}}_{\mathrm{r}}$, Sort ${\mathcal{P}}_{\mathrm{s}}$, or Block ${\mathcal{P}}_{\mathrm{b}}$.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Example 3.1: ReLU sequence
  • Definition 4.1: Order sensitivity
  • Proposition 4.2
  • proof