Table of Contents
Fetching ...

Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities

Jiayi Kuang, Haojing Huang, Yinghui Li, Xinnian Liang, Zhikun Xu, Yangning Li, Xiaoyu Tan, Chao Qu, Meishan Zhang, Ying Shen, Philip S. Yu

TL;DR

This work questions whether current LLM mathematical reasoning reflects genuine conceptual understanding or mere memorization from data. It proposes atomic thinking by decoupling reasoning into field- and logic-based atomic capabilities, and builds data-driven evaluations to study their interactions. Experiments show larger models gain stronger atomic abilities, with algebra and analysis outperforming geometry and topology, and conceptual understanding serving as a foundational driver for higher-level reasoning. The findings offer a cognitively grounded path toward more transferable and efficient mathematical reasoning in LLMs, guiding targeted training strategies and future research directions.

Abstract

Large Language Models (LLMs) have demonstrated outstanding performance in mathematical reasoning capabilities. However, we argue that current large-scale reasoning models primarily rely on scaling up training datasets with diverse mathematical problems and long thinking chains, which raises questions about whether LLMs genuinely acquire mathematical concepts and reasoning principles or merely remember the training data. In contrast, humans tend to break down complex problems into multiple fundamental atomic capabilities. Inspired by this, we propose a new paradigm for evaluating mathematical atomic capabilities. Our work categorizes atomic abilities into two dimensions: (1) field-specific abilities across four major mathematical fields, algebra, geometry, analysis, and topology, and (2) logical abilities at different levels, including conceptual understanding, forward multi-step reasoning with formal math language, and counterexample-driven backward reasoning. We propose corresponding training and evaluation datasets for each atomic capability unit, and conduct extensive experiments about how different atomic capabilities influence others, to explore the strategies to elicit the required specific atomic capability. Evaluation and experimental results on advanced models show many interesting discoveries and inspirations about the different performances of models on various atomic capabilities and the interactions between atomic capabilities. Our findings highlight the importance of decoupling mathematical intelligence into atomic components, providing new insights into model cognition and guiding the development of training strategies toward a more efficient, transferable, and cognitively grounded paradigm of "atomic thinking".

Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities

TL;DR

This work questions whether current LLM mathematical reasoning reflects genuine conceptual understanding or mere memorization from data. It proposes atomic thinking by decoupling reasoning into field- and logic-based atomic capabilities, and builds data-driven evaluations to study their interactions. Experiments show larger models gain stronger atomic abilities, with algebra and analysis outperforming geometry and topology, and conceptual understanding serving as a foundational driver for higher-level reasoning. The findings offer a cognitively grounded path toward more transferable and efficient mathematical reasoning in LLMs, guiding targeted training strategies and future research directions.

Abstract

Large Language Models (LLMs) have demonstrated outstanding performance in mathematical reasoning capabilities. However, we argue that current large-scale reasoning models primarily rely on scaling up training datasets with diverse mathematical problems and long thinking chains, which raises questions about whether LLMs genuinely acquire mathematical concepts and reasoning principles or merely remember the training data. In contrast, humans tend to break down complex problems into multiple fundamental atomic capabilities. Inspired by this, we propose a new paradigm for evaluating mathematical atomic capabilities. Our work categorizes atomic abilities into two dimensions: (1) field-specific abilities across four major mathematical fields, algebra, geometry, analysis, and topology, and (2) logical abilities at different levels, including conceptual understanding, forward multi-step reasoning with formal math language, and counterexample-driven backward reasoning. We propose corresponding training and evaluation datasets for each atomic capability unit, and conduct extensive experiments about how different atomic capabilities influence others, to explore the strategies to elicit the required specific atomic capability. Evaluation and experimental results on advanced models show many interesting discoveries and inspirations about the different performances of models on various atomic capabilities and the interactions between atomic capabilities. Our findings highlight the importance of decoupling mathematical intelligence into atomic components, providing new insights into model cognition and guiding the development of training strategies toward a more efficient, transferable, and cognitively grounded paradigm of "atomic thinking".

Paper Structure

This paper contains 39 sections, 5 figures, 12 tables.

Figures (5)

  • Figure 1: This figure illustrates an overview of our atomic thinking. It compares thought chains and atomic thinking, highlighting the efficiency of atomic thinking. Next, it shows the atomic capabilities we focus on. Finally, it provides the performance of advanced models in every atomic capability.
  • Figure 2: This figure illustrates our data construction procedure.
  • Figure 3: Case study about the error cases in Topology and Conceptual Atomic ability and the comparison of stimulating one atomic capability on another atomic capability.
  • Figure 4: The figure presents data examples in logical atomic capability, including property descriptions, definition recognition, formal mathematical language-driven forward reasoning, and counterexample-driven backward reasoning.
  • Figure 5: This figure shows the impact of stimulating one ability on the remaining abilities, where deeper colors indicate a greater positive facilitation effect, while lighter colors indicate a greater negative impact.