LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Shuguang Chen; Guang Lin

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Shuguang Chen, Guang Lin

TL;DR

This work tackles the difficulty of mathematical reasoning in LLMs, addressing data scarcity and error propagation by introducing a paraphrase-based data augmentation pipeline and specialized multitask training objectives. The method combines question paraphrasing via GPT-4 with Rationale Re-Ranking and Mistake Identification within a multitask fine-tuning framework, culminating in a final objective $\mathcal{L}_{final}(\theta)= \lambda_{1}\mathcal{L}_{SFT} + \lambda_{2}\mathcal{L}_{RR} + \lambda_{3}\mathcal{L}_{MI}$. Experiments across four open-source models and four math-oriented datasets show consistent gains, with larger improvements for weaker models and notable benefits when paraphrase is combined with RR/MI. The results highlight the value of linguistic diversification and structured reasoning guidance for mathematical problem solving, and point to practical implications for real-world tasks requiring reliable math reasoning, potentially amplified by integrating symbolic computation. Future work may explore hybrid neural-symbolic approaches to further reduce arithmetic errors and improve reliability in long reasoning chains.

Abstract

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical reasoning skills. Existing approaches to address this challenge often rely on ensemble methods and suffer from the problem of data scarcity in target domains. In this work, we present a novel method to enhance LLMs' capabilities in mathematical reasoning tasks. Motivated by the need to bridge this gap, our approach incorporates a question paraphrase strategy, which aims at diversifying the linguistic forms of mathematical questions to improve generalization. Additionally, specialized training objectives are employed to guide the model's learning process, focusing on enhancing its understanding of mathematical concepts and reasoning processes. We conduct experiments on four datasets using different LLMs, and demonstrate the effectiveness of our approach in improving LLMs' performance on mathematical reasoning tasks. Our findings underscore the significance of our methodology in the advancement of large language models and its potential implications for real-world applications that require mathematical reasoning abilities.

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

TL;DR

. Experiments across four open-source models and four math-oriented datasets show consistent gains, with larger improvements for weaker models and notable benefits when paraphrase is combined with RR/MI. The results highlight the value of linguistic diversification and structured reasoning guidance for mathematical problem solving, and point to practical implications for real-world tasks requiring reliable math reasoning, potentially amplified by integrating symbolic computation. Future work may explore hybrid neural-symbolic approaches to further reduce arithmetic errors and improve reliability in long reasoning chains.

Abstract

Paper Structure (25 sections, 1 equation, 4 figures, 2 tables)

This paper contains 25 sections, 1 equation, 4 figures, 2 tables.

Introduction
Background
Recent Advances in LLMs
LLMs for mathematics
Methodology
Question Paraphrase
Paraphrasing Questions
Special Training Objectives
Rationale Re-Ranking (RR)
Mistake Identification (MI)
Training Process
Final training objective
Experiments
Datasets
Base Models
...and 10 more sections

Figures (4)

Figure 1: Llama2-7B model prediction on an example from the GSM8K dataset.
Figure 2: The overview of our proposed pipeline.
Figure 3: Analysis of Llama2 and Mistral on different reasoning steps, respectively. X-axis is the number of reasoning steps required to solve a math question and Y-axis the accuracy score. We use supervised fine-tuning as the baseline, and compare it with our proposed method using special training objective. QP stands for question paraphrase.
Figure 4: Case study on the GSM8K dataset using the Mistral-7B Model. The example on the left is positive while the example on the right is negative.

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

TL;DR

Abstract

LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)