Table of Contents
Fetching ...

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji

TL;DR

ModelingBench provides real-world, open-ended math-modeling problems drawn from COMAP contests, challenging LLMs to translate natural language into formal mathematical formulations, data-driven analyses, and defensible reports. ModelingAgent, a four-agent system with a shared memory and a Critic, enables iterative self-improvement and coordinated tool use to tackle these tasks, while ModelingJudge offers expert-in-the-loop evaluation to mirror real-world judging practices. Across extensive experiments, ModelingAgent outperforms vanilla and tool-based baselines by up to 20% and approaches human performance in several metrics, though deeper innovativeness and robust data-grounded reasoning remain challenging. The work argues for a practical, interpretable, and extensible framework to evaluate and advance real-world problem-solving in interdisciplinary modeling, with potential extensions to multi-modal inputs and stronger human-in-the-loop integration.

Abstract

Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains, ranging from urban traffic optimization to ecosystem resource planning. These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports. ModelingBench also supports multiple valid solutions, capturing the ambiguity and creativity of practical modeling. We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured workflows, and enables iterative self-refinement to generate well-grounded, creative solutions. To evaluate outputs, we further propose ModelingJudge, an expert-in-the-loop system leveraging LLMs as domain-specialized judges assessing solutions from multiple expert perspectives. Empirical results show that ModelingAgent substantially outperforms strong baselines and often produces solutions indistinguishable from those of human experts. Together, our work provides a comprehensive framework for evaluating and advancing real-world problem-solving in open-ended, interdisciplinary modeling challenges.

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

TL;DR

ModelingBench provides real-world, open-ended math-modeling problems drawn from COMAP contests, challenging LLMs to translate natural language into formal mathematical formulations, data-driven analyses, and defensible reports. ModelingAgent, a four-agent system with a shared memory and a Critic, enables iterative self-improvement and coordinated tool use to tackle these tasks, while ModelingJudge offers expert-in-the-loop evaluation to mirror real-world judging practices. Across extensive experiments, ModelingAgent outperforms vanilla and tool-based baselines by up to 20% and approaches human performance in several metrics, though deeper innovativeness and robust data-grounded reasoning remain challenging. The work argues for a practical, interpretable, and extensible framework to evaluate and advance real-world problem-solving in interdisciplinary modeling, with potential extensions to multi-modal inputs and stronger human-in-the-loop integration.

Abstract

Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains, ranging from urban traffic optimization to ecosystem resource planning. These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports. ModelingBench also supports multiple valid solutions, capturing the ambiguity and creativity of practical modeling. We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured workflows, and enables iterative self-refinement to generate well-grounded, creative solutions. To evaluate outputs, we further propose ModelingJudge, an expert-in-the-loop system leveraging LLMs as domain-specialized judges assessing solutions from multiple expert perspectives. Empirical results show that ModelingAgent substantially outperforms strong baselines and often produces solutions indistinguishable from those of human experts. Together, our work provides a comprehensive framework for evaluating and advancing real-world problem-solving in open-ended, interdisciplinary modeling challenges.

Paper Structure

This paper contains 47 sections, 13 equations, 29 figures, 5 tables, 1 algorithm.

Figures (29)

  • Figure 1: An example math modeling problem and the five core corresponding skills required.
  • Figure 2: The automated system for solving and evaluating modeling problems.
  • Figure 3: An illustration of iterative refinement performed by the critic module in ModelingAgent.
  • Figure 4: The critic’s scoring trend across rounds shows a clear upward trajectory, highlighting ModelingAgent’s consistent improvements and effective self-evolution in addressing modeling challenges.
  • Figure 5: Case study of iterative refinement of modeling idea and corresponding critics.
  • ...and 24 more figures