Large Language Models for Mathematical Analysis
Ziye Chen, Hao Qi
TL;DR
The paper tackles the challenge of enabling LLMs to perform rigorous mathematical analysis by introducing the DEMI-MathAnalysis dataset of proof-based problems and a guiding framework that combines problem classification, knowledge retrieval, and structured solution generation. The approach emphasizes formal reasoning, including $\\epsilon$-$\\delta$ proofs, across topics like Sequences and Limits, Infinite Series, and Convex Functions, and demonstrates substantial performance gains for smaller models when fine-tuned and guided. Empirical results show that fine-tuned Llama-3.2 and Qwen-2.5, when paired with the framework, substantially improve over baselines and approach the performance of larger models such as the OpenAI o1-preview, signaling progress toward trustworthy AI capable of formal mathematical reasoning. The work lays a foundation for more robust evaluation and expansion to additional topics, and suggests future directions such as translating proofs into Lean and enhancing evaluation pipelines to ensure rigorous proof quality.
Abstract
Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on mathematical problem-solving, most existing work and datasets concentrate on computational tasks, leaving gaps in areas like mathematical analysis, which demands rigorous proofs and formal reasoning. We developed the DEMI-MathAnalysis dataset, comprising proof-based problems from mathematical analysis topics such as Sequences and Limits, Infinite Series, and Convex Functions. We also designed a guiding framework to rigorously enhance LLMs' ability to solve these problems. Through fine-tuning LLMs on this dataset and employing our framework, we observed significant improvements in their capability to generate logical, complete, and elegant proofs. This work addresses critical gaps in mathematical reasoning and contributes to advancing trustworthy AI capable of handling formalized mathematical language. The code is publicly accessible at LLMs for Mathematical Analysis.
