Table of Contents
Fetching ...

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

Lei Zan, Keli Zhang, Ruichu Cai, Lujia Pan

TL;DR

Large Language Models struggle with deep mathematical reasoning due to structural dependencies. CAMA introduces a two-stage framework that learns a Mathematical Causal Graph (MCG) from question–solution data via causal discovery and uses it to guide reasoning through task-specific subgraphs, without parameter updates. Experiments on AIME, Omni-MATH, and OlympiadBench show that structured, directed guidance via the MCG improves accuracy over unstructured prompting and ablations, with directed edges providing additional gains. Limitations include sensitivity to knowledge-point granularity and a static graph during reasoning; future work aims to enable dynamic graph updates and expanded knowledge integration.

Abstract

Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose \textbf{CA}usal \textbf{MA}thematician (\textbf{CAMA}), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the \textbf{M}athematical \textbf{C}ausal \textbf{G}raph (\textbf{MCG}), a high-level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question-solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question-solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task-relevant subgraph from the MCG, conditioned on both the question content and the LLM's intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real-world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

TL;DR

Large Language Models struggle with deep mathematical reasoning due to structural dependencies. CAMA introduces a two-stage framework that learns a Mathematical Causal Graph (MCG) from question–solution data via causal discovery and uses it to guide reasoning through task-specific subgraphs, without parameter updates. Experiments on AIME, Omni-MATH, and OlympiadBench show that structured, directed guidance via the MCG improves accuracy over unstructured prompting and ablations, with directed edges providing additional gains. Limitations include sensitivity to knowledge-point granularity and a static graph during reasoning; future work aims to enable dynamic graph updates and expanded knowledge integration.

Abstract

Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose \textbf{CA}usal \textbf{MA}thematician (\textbf{CAMA}), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the \textbf{M}athematical \textbf{C}ausal \textbf{G}raph (\textbf{MCG}), a high-level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question-solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question-solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task-relevant subgraph from the MCG, conditioned on both the question content and the LLM's intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real-world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.

Paper Structure

This paper contains 44 sections, 20 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: The CAMA framework consists of two stages: learning and reasoning. In the learning stage, (1) CAMA constructs an initial Mathematical Causal Graph (MCG) from question–solution pairs by combining LLM outputs with classical causal discovery methods to identify key knowledge points and their causal dependencies; (2) the MCG is then refined using feedback from the LLM’s answers to better align with the downstream reasoning task. In the reasoning stage, the optimized MCG is used to solve new questions through a three-step process: generating a reasoning trace, extracting a relevant subgraph, and guiding the LLM to produce the final answer.
  • Figure 2: An example of a Mathematical Causal Graph (MCG) is shown, illustrating three knowledge points: Area of a circle, Volume of a cylinder, and Volume of a cone. The edges indicate that understanding the Area of a circle is required to compute both the Volume of a cylinder and the Volume of a cone.
  • Figure 3: This figure shows the Pass@1 scores of CAMA using Mathematical Causal Graphs (MCGs) built with different knowledge point granularities, controlled by the parameter $\lambda$ (ranging from 2 to 7). Each $\lambda$ produces a distinct MCG from the AIME2022 and AIME2023 training data, and results are averaged over three repetitions. Performance is reported on both the training set (AIME2022 and AIME2023) and the test set (AIME2024). The base LLM is DSV3, with its scores included as references, shown with purple diamonds for the training sets and a green triangle for the test set.
  • Figure 4: When the knowledge point granularity is set to $\lambda=3$, the Mathematical Causal Graph learned by CAMA from the combined AIME2022 and AIME2023 datasets consists of 124 nodes and 184 edges, including 129 directed edges (in blue) and 55 undirected edges (in black).
  • Figure 5: Left: Pass@1 score of CAMA on OlympiadBench-674 with MCGs built at different granularities ($\lambda=2\text{--}5$). Right: percentage of OlympiadBench-674 questions that use at least one knowledge point present in each MCG. Results are averaged over three repetitions.