OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models
Jianzhang Zhang, Jialong Zhou, Chuang Liu
TL;DR
OR-Toolformer tackles privacy and compute barriers in Operations Research by fine-tuning an open-source LLM (Llama-3.1-8B-Instruct) with a semi-automatic data-synthesis pipeline and tool-augmented solving. It unifies Problem–Answer Data Generation, parameter-efficient LLM fine-tuning, and external OR solvers to generate API-call driven solutions, achieving up to $80.1\%$ execution accuracy on three benchmarks and $54\%$ average zero-shot accuracy on unseen problem types. The method outperforms size-matched baselines and demonstrates strong generalization to tools not seen during training, while maintaining concise, token-efficient outputs. These results highlight the practicality of open-source, tool-augmented LLMs for accurate and privacy-preserving OR problem modeling and solving, with potential for broader industrial deployment.
Abstract
Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.
