OptiMUS-0.3: Using Large Language Models to Model and Solve Optimization Problems at Scale
Ali AhmadiTeshnizi, Wenzhi Gao, Herman Brunborg, Shayan Talaei, Connor Lawless, Madeleine Udell
TL;DR
OptiMUS-0.3 presents a modular, LLM-driven framework to automatically model and solve MILPs from natural language, addressing key challenges of long problem descriptions, data scale, and correctness. It introduces NLP4LP, a 355-problem dataset with rich intermediates and a public web app for human-in-the-loop refinement, and demonstrates state-of-the-art performance gains over baselines on easy and hard optimization tasks. The system combines a structured state graph, reflective error correction, confidence-based feedback, and specialized modules to detect problem structure and enable advanced solver interactions, achieving robust accuracy with reasonable runtime. Collectively, the work advances automated optimization modeling and highlights future directions for reliability, trust, ambiguity handling, and multi-language support in AI-assisted optimization."
Abstract
Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce a Large Language Model (LLM)-based system designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. Our system is capable of developing mathematical models, writing and debugging solver code, evaluating the generated solutions, and improving efficiency and correctness of its model and code based on these evaluations. OptiMUS-0.3 utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS-0.3 outperforms existing state-of-the-art methods on easy datasets by more than 22% and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than 24%.
