Automated Optimization Modeling via a Localizable Error-Driven Perspective
Weiting Liu, Han Wu, Yufei Kuang, Xiongwei Han, Tao Zhong, Jianfeng Feng, Wenlian Lu
TL;DR
The paper identifies that errors in LLM-generated optimization formulations tend to be localized to specific components, not across the entire solution. It proposes MIND, a two-stage framework combining error-driven reverse data synthesis (MIND-Train) and Dynamic Supervised Fine-Tuning Policy Optimization (DFPO) to address data-sparsity and sparse-reward challenges. Empirical results across six benchmarks and two base models show substantial improvements over state-of-the-art methods, with strong generalization to out-of-distribution problems via a new MIND-Bench. The work also contributes open-source data and benchmarks for the optimization research community. Overall, MIND demonstrates that focusing training on local error patterns and coupling supervised refinement with RL yields robust, scalable gains in automated optimization modeling with LLMs.
Abstract
Automated optimization modeling via Large Language Models (LLMs) has emerged as a promising approach to assist complex human decision-making. While post-training has become a pivotal technique to enhance LLMs' capabilities in this domain, its effectiveness is severely constrained by the scarcity and underutilization of high-quality training data. However, through a detailed profiling of error patterns across various problem-response pairs drawn from post-training, we identify two fundamental limitations of existing automated optimization modeling approaches: (L1) the sparsity of error-specific problems and (L2) the sparse rewards associated with difficult problems. We demonstrate that these limitations can result in suboptimal performance in domain-specific post-training for LLMs. To tackle the above two limitations, we propose a novel error-driven learning framework -- namely, auto\textbf{m}ated opt\textbf{i}mization modeli\textbf{n}g via a localizable error-\textbf{d}riven perspective (MIND) -- that customizes the whole model training framework from data synthesis to post-training. MIND is based on our key observation of the unique localizable patterns in error propagation of optimization modelings, that is, modeling errors may remain localized to specific semantic segments and do not propagate throughout the entire solution. Thus, in contrast to holistic reasoning tasks such as mathematical proofs, MIND leverages the construction of a focused, high-density training corpus and proposes \textbf{D}ynamic Supervised \textbf{F}ine-Tuning \textbf{P}olicy \textbf{O}ptimization (DFPO) to tackle difficult problems through localized refinement. Experiments on six benchmarks demonstrate that MIND consistently outperforms all the state-of-the-art automated optimization modeling approaches.
