Table of Contents
Fetching ...

Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure

Haotong Yang, Fanxu Meng, Zhouchen Lin, Muhan Zhang

TL;DR

The paper tackles how pretrained LLMs achieve complex reasoning under pure language-modeling objectives and proposes the template-content (T-C) structure as a principled explanation. By separating output into fixed templates (task skeletons) and flexible content (problem-specific data), the authors show that the learning space for reasoning tasks becomes tractable, and extend this idea to a hierarchical form enabling task composition. They provide formal definitions, constructive proofs for the existence of T-C Transformers, and universal-approximation extensions to causal Transformers, along with empirical evidence that current LLMs exhibit T-C-like behavior and that explicit T-C learning improves performance through content-replacement data augmentation. The work offers a concrete mechanism for reasoning in large language models, with implications for data efficiency, prompt design, and the development of compositional AI systems. Overall, the T-C framework advances our understanding of how structure in language can underlie robust reasoning in AI systems, and suggests practical avenues to enhance inductive generalization and multi-step problem solving.

Abstract

The pre-trained large language models (LLMs) have shown their extraordinary capacity to solve reasoning tasks, even on tasks that require a complex process involving multiple sub-steps. However, given the vast possible generation space of all the tasks, how the pretrained model learns the reasoning ability remains an open question. We firstly propose that an intrinsic structural constraint on the generated sequence of language-based reasoning -- we called it template-content structure (T-C structure) -- is the key to explain why LLMs can solve a large number of complex reasoning problems with limited training data by showing this structure can reduce the possible space from exponential level to linear level. Furthermore, by generalizing this structure to the hierarchical case, we demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic, thereby effectively learning on complex reasoning involving multiple steps. We provide both examples and formal theory of our T-C structure. We also experimentally validate the existence of the T-C structure in some current LLMs and its effectiveness for reasoning.

Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure

TL;DR

The paper tackles how pretrained LLMs achieve complex reasoning under pure language-modeling objectives and proposes the template-content (T-C) structure as a principled explanation. By separating output into fixed templates (task skeletons) and flexible content (problem-specific data), the authors show that the learning space for reasoning tasks becomes tractable, and extend this idea to a hierarchical form enabling task composition. They provide formal definitions, constructive proofs for the existence of T-C Transformers, and universal-approximation extensions to causal Transformers, along with empirical evidence that current LLMs exhibit T-C-like behavior and that explicit T-C learning improves performance through content-replacement data augmentation. The work offers a concrete mechanism for reasoning in large language models, with implications for data efficiency, prompt design, and the development of compositional AI systems. Overall, the T-C framework advances our understanding of how structure in language can underlie robust reasoning in AI systems, and suggests practical avenues to enhance inductive generalization and multi-step problem solving.

Abstract

The pre-trained large language models (LLMs) have shown their extraordinary capacity to solve reasoning tasks, even on tasks that require a complex process involving multiple sub-steps. However, given the vast possible generation space of all the tasks, how the pretrained model learns the reasoning ability remains an open question. We firstly propose that an intrinsic structural constraint on the generated sequence of language-based reasoning -- we called it template-content structure (T-C structure) -- is the key to explain why LLMs can solve a large number of complex reasoning problems with limited training data by showing this structure can reduce the possible space from exponential level to linear level. Furthermore, by generalizing this structure to the hierarchical case, we demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic, thereby effectively learning on complex reasoning involving multiple steps. We provide both examples and formal theory of our T-C structure. We also experimentally validate the existence of the T-C structure in some current LLMs and its effectiveness for reasoning.
Paper Structure (45 sections, 11 theorems, 23 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 45 sections, 11 theorems, 23 equations, 14 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.2

To solve a type of similar reasoning problems, many tokens of answer sequence is almost certain. These tokens form a relatively fixed thinking structure or skeleton, and they are shared for these problems - we call such a type of problems a task and these relatively fixed tokens as templates. Prompt

Figures (14)

  • Figure 1: An illustration of the template-content structure. Given prompt and question: 1. The model will generate the template tokens (highlighted as yellow) as a flow to solve the task according to the prompt, and some content placeholder (blue) in the template that needs to be filled in, which are displayed in the upper half of the dashed box. 2. The content generation with the guidance of the template could be understood as pointing, shown in the bottom half. Here, corresponding colors shows the pointing process. The combination of these two mechanisms makes reasoning possible.
  • Figure 2: The concatenate-last-letter dataset. The task is to concatenate the last letters of several words together. The template tokens are generated by GPT-4 and fixed, while all the contents (<word>,<letter>,<answer>) varies.
  • Figure 2: The correlation between the distinguishablility of T/C tokens and the reasoning performance.
  • Figure 3: The T/C classification generated by the autoregressive classifier based on a Llama-2-70b model. Template: yellow, content: blue. Left: concatenate-last-letter. Right: SingleEQ. We mark the token whose classification conflicts with human intuition as red.
  • Figure 4: The performance with content replacement, random replacement and no replacement. The shadow is the standard error in 3 times experiments.
  • ...and 9 more figures

Theorems & Definitions (29)

  • Definition 3.1: Prompt-leading autoregressive models for answer generation
  • Proposition 3.2: Template, informal
  • Proposition 3.3: content, informal
  • Proposition 3.4: Transformers can learn the T-C structure
  • Proposition 3.5: The T-C Transformers can achieve the within-task generalization
  • Definition A.1: Causal sequence-to-sequence function
  • Definition A.2: Template and content, T/C
  • Definition A.3: The groundtruth classification and the template-content generation model
  • Proposition A.5
  • proof
  • ...and 19 more