Table of Contents
Fetching ...

Toward Mechanistic Explanation of Deductive Reasoning in Language Models

Davide Maltoni, Matteo Ferrara

TL;DR

This paper tackles the problem of understanding how language models perform deductive reasoning beyond surface statistics. It trains a tiny, non-pretrained decoder-only model with Chain-of-Thought prompting on a symbol-based Horn-clause task and uses mechanistic interpretability tools to reveal internal circuits. The authors find that induction heads instantiate rule completion and rule chaining, forming a minimal two-layer mechanism that generalizes to unseen instances. The work demonstrates that symbolic-like rule learning is achievable by LMs and provides practical interpretability methods, including a truncated pseudoinverse, with implications for scaling to more complex reasoning tasks.

Abstract

Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task.

Toward Mechanistic Explanation of Deductive Reasoning in Language Models

TL;DR

This paper tackles the problem of understanding how language models perform deductive reasoning beyond surface statistics. It trains a tiny, non-pretrained decoder-only model with Chain-of-Thought prompting on a symbol-based Horn-clause task and uses mechanistic interpretability tools to reveal internal circuits. The authors find that induction heads instantiate rule completion and rule chaining, forming a minimal two-layer mechanism that generalizes to unseen instances. The work demonstrates that symbolic-like rule learning is achievable by LMs and provides practical interpretability methods, including a truncated pseudoinverse, with implications for scaling to more complex reasoning tasks.

Abstract

Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task.

Paper Structure

This paper contains 20 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Average accuracy over 20 runs reaching convergence.
  • Figure 2: An example of output produced by the developed visualization tool. The explanation is in the main text.
  • Figure 3: Circuits involved in Rule completion. The explanation is in the main text.
  • Figure 4: Circuits involved in Rule chaining. The explanation is in the main text.
  • Figure 5: Circuits involved in Start and final decision. The explanation is in the main text.
  • ...and 1 more figures