Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic
Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera
TL;DR
This paper tackles the problem that the order of chain-of-thought steps in autoregressive Transformers significantly affects learning efficiency on arithmetic tasks. It formalizes learning-friendly orders using a memoryless cost model and proposes a loss-profiling, two-stage global-local search to automatically discover effective output orders. The approach is validated on six order-sensitive arithmetic tasks and Prod, showing the ability to identify learning-friendly orders from billions of permutations and even rediscover the known reverse-digit order for multiplication, with near-perfect success for several tasks at moderate lengths. The work suggests a principled way to design input-output ordering—beyond model architecture or prompting—to improve reasoning tasks and points toward broader applications to natural language and symbolic computation.
Abstract
The chain of thought, i.e., step-by-step reasoning, is one of the fundamental mechanisms of Transformers. While the design of intermediate reasoning steps has been extensively studied and shown to critically influence performance on mathematical, multi-step reasoning tasks, the ordering of these steps has received little attention, despite its significant effect on the difficulty of reasoning. This study addresses a novel task of unraveling the chain of thought -- reordering decoder input tokens into a learning-friendly sequence for Transformers, for learning arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially in sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on seven order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, it recovered the reverse-digit order reported in prior studies for the multiplication task.
