MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Shuo Yin; Weihao You; Zhilong Ji; Guoqiang Zhong; Jinfeng Bai

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

TL;DR

This work tackles the challenge of high-performing mathematical reasoning with open LLMs by uniting two research directions: data-augmented purely reasoning and tool-assisted computation. It introduces MuMath-Code-Data, a multi-perspective augmentation dataset with code-nested solutions, and a two-stage training pipeline that first enhances pure reasoning and then teaches code generation and tool interaction. The resulting MuMath-Code models achieve state-of-the-art results among open models on GSM8K and MATH, with notable gains at 7B, 34B, and 70B scales, and the authors provide ablations and scaling analyses to validate their approach. The dataset and code release aim to empower future work at the intersection of data augmentation and external tool use for mathematical reasoning in open LLMs.

Abstract

The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($μ$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use.

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

TL;DR

Abstract

-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use.

Paper Structure (32 sections, 3 equations, 4 figures, 5 tables)

This paper contains 32 sections, 3 equations, 4 figures, 5 tables.

Introduction
Related Work
Tool-Free LLMs for Math
Tool-Use LLMs for Math
Preliminaries
MuMath Augmented Questions
(1) Rephrasing
(2) Question Alteration
(3) FOBAR
(4) BF-Trans
MuMath-Data
Majority Sampling
Methodology
MuMath-Code-Data
Prefix CoT
...and 17 more sections

Figures (4)

Figure 1: The comparison between our MuMath-Code and other state-of-the-art tool-use LLMs. MuMath-Code exhibits a substantial improvement in performance on both GSM8K gsm8k and MATH MATH, relative to the previous approaches.
Figure 2: Illustration of our proposed method. The foundation model is first trained through an initial stage, resulting in an intermediary model that possesses more powerful math reasoning capability. This intermediary model is then further trained on the proposed dataset to learn code generation and tool interaction, leading to the final model, MuMath-Code.
Figure 3: Scaling all the subsets of MuMath-Code-Data. The models undergo a single stage (only Stage-2) of training.
Figure 4: Scaling all the subsets of MuMath-Code-Data. The model has already been finetuned on MuMath-Data. It is observable that the curves show very similar trends to those in Figure \ref{['fig:scaling_single_stage']}.

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

TL;DR

Abstract

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)