Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, Ziyi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei
TL;DR
The paper tackles the limitation of single-paradigm reasoning in large language models for mathematical tasks by introducing Chain-of-Reasoning (CoR), which unifies Natural Language Reasoning ($NLR$), Algorithmic Reasoning ($AR$), and Symbolic Reasoning ($SR$). It introduces the Multi-Paradigm Math (MPM) dataset and the Progressive Paradigm Training (PPT) curriculum to enable a model (CoR-Math-7B) to master all three paradigms and synthesize their outputs into accurate solutions. Across five challenging benchmarks spanning arithmetic and theorem proving, CoR-Math-7B achieves state-of-the-art zero-shot performance, including a $41.0\%$ absolute improvement over GPT-4o on miniF2F and a significant edge on MATH, while maintaining efficiency through multi-paradigm test-time inference (SMPS). The work demonstrates that cross-paradigm collaboration yields superior generalization and efficiency, proposing a new direction for scalable, unified mathematical reasoning in LLMs.
Abstract
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet often rely on single-paradigm reasoning, limiting their effectiveness across diverse tasks. We introduce Chain-of-Reasoning (CoR), a novel unified framework integrating multiple reasoning paradigms--Natural Language Reasoning (NLR), Algorithmic Reasoning (AR), and Symbolic Reasoning (SR)--to enable synergistic collaboration. CoR generates multiple potential answers via different reasoning paradigms and synthesizes them into a coherent final solution. We propose a Progressive Paradigm Training (PPT) strategy for models to progressively master these paradigms, leading to CoR-Math-7B. Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models, achieving up to a 41.0% absolute improvement over GPT-4o in theorem proving and a 15.0% improvement over RL-based methods on the MATH benchmark in arithmetic tasks. These results show the enhanced mathematical comprehension ability of our model, enabling zero-shot generalization across tasks.
