Table of Contents
Fetching ...

Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

Nikhil Prakash, Donghao Ren, Dominik Moritz, Yannick Assogba

TL;DR

The paper addresses enhancing mathematical reasoning in LLMs by leveraging mechanistic interpretability to identify sparse circuits responsible for reasoning. It introduces Constructive Circuit Amplification (CCA), a three-stage framework that localizes pivotal reasoning tokens, isolates constructive model components with Desiderata-based Component Masking (DCM), and updates only those components. Applied to GSM-Symbolic, CCA achieves up to +11.4% accuracy with updates to as little as 1.59% of components, while preserving broad capabilities on benchmarks like MMLU, TriviaQA, and TruthfulQA. This work demonstrates that targeted, sparse, circuit-level updates can reliably boost specific reasoning skills with minimal collateral impact, suggesting a practical pathway for safe, capability-specific model adaptation.

Abstract

Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening of existing circuits in the model. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise, task-targeted updates. Motivated by these findings, we propose a novel method called Constructive Circuit Amplification which identifies pivotal tokens from model reasoning traces as well as model components responsible for the desired task, and updates only those components. Applied to mathematical reasoning, it improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components, with minimal impact on other abilities as measured by MMLU, TriviaQA, and TruthfulQA. These results demonstrate that targeted capabilities can be reliably enhanced by selectively updating a sparse set of model components.

Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates

TL;DR

The paper addresses enhancing mathematical reasoning in LLMs by leveraging mechanistic interpretability to identify sparse circuits responsible for reasoning. It introduces Constructive Circuit Amplification (CCA), a three-stage framework that localizes pivotal reasoning tokens, isolates constructive model components with Desiderata-based Component Masking (DCM), and updates only those components. Applied to GSM-Symbolic, CCA achieves up to +11.4% accuracy with updates to as little as 1.59% of components, while preserving broad capabilities on benchmarks like MMLU, TriviaQA, and TruthfulQA. This work demonstrates that targeted, sparse, circuit-level updates can reliably boost specific reasoning skills with minimal collateral impact, suggesting a practical pathway for safe, capability-specific model adaptation.

Abstract

Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening of existing circuits in the model. Taken together, these findings suggest the possibility of intervening directly on such circuits to make precise, task-targeted updates. Motivated by these findings, we propose a novel method called Constructive Circuit Amplification which identifies pivotal tokens from model reasoning traces as well as model components responsible for the desired task, and updates only those components. Applied to mathematical reasoning, it improves accuracy by up to +11.4% across multiple models while modifying as little as 1.59% of model components, with minimal impact on other abilities as measured by MMLU, TriviaQA, and TruthfulQA. These results demonstrate that targeted capabilities can be reliably enhanced by selectively updating a sparse set of model components.

Paper Structure

This paper contains 27 sections, 2 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Overview of CCA: (a) Token Localization: For a given problem, we generate both correct and incorrect reasoning traces and identify the pivotal token where the incorrect trace diverges from the correct one. The intervention point is chosen as the token immediately preceding this divergence. (b) Model Component Localization: Using the Error-Localization dataset constructed from these reasoning trace pairs, we apply Desiderata-based Component Masking (DCM) to learn a sparse binary mask over attention heads and MLP neurons. This identifies the subset of components that most strongly promote the desired token. (c) Model Update: Gradient updates are then applied exclusively to the localized components, amplifying constructive computations while leaving the rest of the network unchanged.
  • Figure 2: Example of a GSM-Symbolic math word problem showing both a correct and an incorrect reasoning trace produced by the Gemma-2-9b-Instruct model. The correct trace (top) is obtained through greedy decoding, while the incorrect trace (bottom) is produced by non-greedy sampling.
  • Figure 3: The Error-Localization dataset contains three components: prefix: the shared reasoning trace between the correct and incorrect paths (including intervention token), desired_token: the token the model should generate to produce the correct answer, and undesired_token: the token the model should avoid generating to ensure the correct answer.