CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
Chengwei Wei, Bin Wang, Jung-jae Kim, Guimei Liu, Nancy F. Chen
TL;DR
The paper tackles how coding instruction can enhance mathematical reasoning in large language models by systematically studying coding styles, cross-domain coding data, and text-code integration. It finds that code-based rationales with concise comments, descriptive naming, and hardcoded solutions provide the strongest learning signals, while general-domain coding data and math-text explanations offer limited benefits and can even hinder certain models. Building on these insights, it proposes CoinMath, which ensembles three diverse code-based rationale styles to maximize mathematical reasoning gains, achieving a substantial improvement over the SOTA baseline $5.9\%$ on standard math evaluation datasets. The work also highlights the importance of tailoring instruction tuning data to the target reasoning tasks and provides datasets and pipelines to advance reproducibility in math reasoning for LLMs.
Abstract
Large Language Models (LLMs) have shown strong performance in solving mathematical problems, with code-based solutions proving particularly effective. However, the best practice to leverage coding instruction data to enhance mathematical reasoning remains underexplored. This study investigates three key questions: (1) How do different coding styles of mathematical code-based rationales impact LLMs' learning performance? (2) Can general-domain coding instructions improve performance? (3) How does integrating textual rationales with code-based ones during training enhance mathematical reasoning abilities? Our findings reveal that code-based rationales with concise comments, descriptive naming, and hardcoded solutions are beneficial, while improvements from general-domain coding instructions and textual rationales are relatively minor. Based on these insights, we propose CoinMath, a learning strategy designed to enhance mathematical reasoning by diversifying the coding styles of code-based rationales. CoinMath generates a variety of code-based rationales incorporating concise comments, descriptive naming conventions, and hardcoded solutions. Experimental results demonstrate that CoinMath significantly outperforms its baseline model, MAmmoTH, one of the SOTA math LLMs.
