Table of Contents
Fetching ...

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan Liu, Zherui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu

TL;DR

This work introduces MM-Bench, a benchmark of 111 real-world mathematical modeling problems sourced from MCM/ICM competitions (2000–2025) across ten domains, and HMML, a hierarchical knowledge library of about 98 modeling methods to guide autonomous LLM-driven problem solving. Building on this, MM-Agent is proposed as an expert-inspired, end-to-end four-stage system that performs open-ended problem analysis, structured model formulation, computational solving via code generation with an MLE-Solver, and rigorous report generation, all underpinned by a memory-augmented task coordination framework and hierarchical method retrieval. Experimental results show that MM-Agent outperforms state-of-the-art baseline agents and even rival human experts by up to 11.88% across evaluation metrics, while offering cost-effective execution and real-world validation through the MCM/ICM 2025 Finalist Award, underscoring its practical viability as a modeling copilot. The work also presents a thorough ablation and efficiency analysis, highlighting the critical roles of problem analysis, HMML-driven retrieval, and hierarchical actor-critic optimization, and demonstrates robustness across multiple backbones and related modeling tasks, thereby advancing autonomous, transparent, and scalable real-world mathematical reasoning.

Abstract

Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11.88\% improvement over human expert solutions while requiring only 15 minutes and \$0.88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (\textbf{top 2.0\% among 27,456 teams}) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot. Our code is available at https://github.com/usail-hkust/LLM-MM-Agent

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

TL;DR

This work introduces MM-Bench, a benchmark of 111 real-world mathematical modeling problems sourced from MCM/ICM competitions (2000–2025) across ten domains, and HMML, a hierarchical knowledge library of about 98 modeling methods to guide autonomous LLM-driven problem solving. Building on this, MM-Agent is proposed as an expert-inspired, end-to-end four-stage system that performs open-ended problem analysis, structured model formulation, computational solving via code generation with an MLE-Solver, and rigorous report generation, all underpinned by a memory-augmented task coordination framework and hierarchical method retrieval. Experimental results show that MM-Agent outperforms state-of-the-art baseline agents and even rival human experts by up to 11.88% across evaluation metrics, while offering cost-effective execution and real-world validation through the MCM/ICM 2025 Finalist Award, underscoring its practical viability as a modeling copilot. The work also presents a thorough ablation and efficiency analysis, highlighting the critical roles of problem analysis, HMML-driven retrieval, and hierarchical actor-critic optimization, and demonstrates robustness across multiple backbones and related modeling tasks, thereby advancing autonomous, transparent, and scalable real-world mathematical reasoning.

Abstract

Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11.88\% improvement over human expert solutions while requiring only 15 minutes and \$0.88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (\textbf{top 2.0\% among 27,456 teams}) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot. Our code is available at https://github.com/usail-hkust/LLM-MM-Agent

Paper Structure

This paper contains 25 sections, 36 figures, 6 tables.

Figures (36)

  • Figure 1: Traditional well-defined mathematics problem vs LLM-powered open-ended mathematical modeling problem. Left: A well-defined mathematical problem, where an agent solves a well-defined problem to obtain a solution. Right: An open-ended mathematical modeling problem, where given an abstract application scenario or phenomenon, the agent first needs to formulate the mathematical problem before solving it and providing an end-to-end solution.
  • Figure 2: The structure of HMML is organized in three levels: modeling domains, subdomains, and method nodes.
  • Figure 3: Overview of the MM-Agent framework. The workflow consists of four sequential phases: Problem Analysis, Mathematical Modeling, Computational Solving, and Solution Reporting. In the Problem Analysis phase, MM-Agent decomposes the input problem into structured subtasks. In Mathematical Modeling, it constructs formal mathematical representations for each subtask. During Computational Solving, MM-Agent applies appropriate computational methods to derive solutions. Finally, in Solution Reporting, it synthesizes the results into a comprehensive report, clearly summarizing the solutions and associated insights.
  • Figure 4: Ablation study of the effect of the problem analysis and mathematical modeling.
  • Figure 5: Illustrations of problem domain and types.
  • ...and 31 more figures