Table of Contents
Fetching ...

OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models

Jianzhang Zhang, Jialong Zhou, Chuang Liu

TL;DR

OR-Toolformer tackles privacy and compute barriers in Operations Research by fine-tuning an open-source LLM (Llama-3.1-8B-Instruct) with a semi-automatic data-synthesis pipeline and tool-augmented solving. It unifies Problem–Answer Data Generation, parameter-efficient LLM fine-tuning, and external OR solvers to generate API-call driven solutions, achieving up to $80.1\%$ execution accuracy on three benchmarks and $54\%$ average zero-shot accuracy on unseen problem types. The method outperforms size-matched baselines and demonstrates strong generalization to tools not seen during training, while maintaining concise, token-efficient outputs. These results highlight the practicality of open-source, tool-augmented LLMs for accurate and privacy-preserving OR problem modeling and solving, with potential for broader industrial deployment.

Abstract

Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.

OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models

TL;DR

OR-Toolformer tackles privacy and compute barriers in Operations Research by fine-tuning an open-source LLM (Llama-3.1-8B-Instruct) with a semi-automatic data-synthesis pipeline and tool-augmented solving. It unifies Problem–Answer Data Generation, parameter-efficient LLM fine-tuning, and external OR solvers to generate API-call driven solutions, achieving up to execution accuracy on three benchmarks and average zero-shot accuracy on unseen problem types. The method outperforms size-matched baselines and demonstrates strong generalization to tools not seen during training, while maintaining concise, token-efficient outputs. These results highlight the practicality of open-source, tool-augmented LLMs for accurate and privacy-preserving OR problem modeling and solving, with potential for broader industrial deployment.

Abstract

Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of OR-Toolformer.
  • Figure 2: Snippet of the generation process of an LP problem-answer pair.
  • Figure 3: Problem generation prompt template.
  • Figure 4: Answer generation prompt template.