Large Language Models for Mathematical Analysis

Ziye Chen; Hao Qi

Large Language Models for Mathematical Analysis

Ziye Chen, Hao Qi

TL;DR

The paper tackles the challenge of enabling LLMs to perform rigorous mathematical analysis by introducing the DEMI-MathAnalysis dataset of proof-based problems and a guiding framework that combines problem classification, knowledge retrieval, and structured solution generation. The approach emphasizes formal reasoning, including $\\epsilon$-$\\delta$ proofs, across topics like Sequences and Limits, Infinite Series, and Convex Functions, and demonstrates substantial performance gains for smaller models when fine-tuned and guided. Empirical results show that fine-tuned Llama-3.2 and Qwen-2.5, when paired with the framework, substantially improve over baselines and approach the performance of larger models such as the OpenAI o1-preview, signaling progress toward trustworthy AI capable of formal mathematical reasoning. The work lays a foundation for more robust evaluation and expansion to additional topics, and suggests future directions such as translating proofs into Lean and enhancing evaluation pipelines to ensure rigorous proof quality.

Abstract

Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on mathematical problem-solving, most existing work and datasets concentrate on computational tasks, leaving gaps in areas like mathematical analysis, which demands rigorous proofs and formal reasoning. We developed the DEMI-MathAnalysis dataset, comprising proof-based problems from mathematical analysis topics such as Sequences and Limits, Infinite Series, and Convex Functions. We also designed a guiding framework to rigorously enhance LLMs' ability to solve these problems. Through fine-tuning LLMs on this dataset and employing our framework, we observed significant improvements in their capability to generate logical, complete, and elegant proofs. This work addresses critical gaps in mathematical reasoning and contributes to advancing trustworthy AI capable of handling formalized mathematical language. The code is publicly accessible at LLMs for Mathematical Analysis.

Large Language Models for Mathematical Analysis

TL;DR

proofs, across topics like Sequences and Limits, Infinite Series, and Convex Functions, and demonstrates substantial performance gains for smaller models when fine-tuned and guided. Empirical results show that fine-tuned Llama-3.2 and Qwen-2.5, when paired with the framework, substantially improve over baselines and approach the performance of larger models such as the OpenAI o1-preview, signaling progress toward trustworthy AI capable of formal mathematical reasoning. The work lays a foundation for more robust evaluation and expansion to additional topics, and suggests future directions such as translating proofs into Lean and enhancing evaluation pipelines to ensure rigorous proof quality.

Abstract

Paper Structure (22 sections, 6 figures, 1 table)

This paper contains 22 sections, 6 figures, 1 table.

Introduction
Related Work
Mathematics Benchmarks for AI
LLMs for Mathematics
Motivation
Dataset
Dataset Creation
Dataset Structure
Guiding Framework
Components
Features and Benefits
Experiment and Results
Experiment Setup
Evaluation Setup
Results and Discussion
...and 7 more sections

Figures (6)

Figure 1: Mathematical fields distribution of current datasets. Note that most of the questions are computation-related with a finite answer.
Figure 2: Number of questions per topic in DEMI- MathAnalysis.
Figure 3: An example in DEMI-MathAnalysis. The LaTeX code has been rendered for better reading.
Figure 4: Framework of instructing analysis problems.
Figure 5: Proof evaluation process using GPT-4o.
...and 1 more figures

Large Language Models for Mathematical Analysis

TL;DR

Abstract

Large Language Models for Mathematical Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)