Table of Contents
Fetching ...

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Zhengyang Tang, Zihan Ye, Chenyu Huang, Xuhan Huang, Chengpeng Li, Sihang Li, Guanhua Chen, Ming Yan, Zizhuo Wang, Hongyuan Zha, Dayiheng Liu, Benyou Wang

TL;DR

This work reframes the adaptation of Large Reasoning Models for optimization modeling by preserving their native reflective reasoning and guiding it with targeted, lightweight interventions. The CALM framework identifies and fixes specific reasoning flaws through expert hints and solver feedback, enabling a two-stage training pipeline that yields autonomous mastery. The resulting STORM model, a 4B LRM, achieves state-of-the-art macro-average accuracy (68.9%) across five benchmarks, matching larger models and demonstrating the efficacy and scalability of hint-based, computation-driven reasoning. The approach offers a practical, scalable path to expert-level optimization modeling with significantly fewer parameters than prior large-model baselines.

Abstract

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional \textit{non-reflective} datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose \textbf{CALM} (\textit{Corrective Adaptation with Lightweight Modification}), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop \textbf{STORM} (\textit{Smart Thinking Optimization Reasoning Model}), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

TL;DR

This work reframes the adaptation of Large Reasoning Models for optimization modeling by preserving their native reflective reasoning and guiding it with targeted, lightweight interventions. The CALM framework identifies and fixes specific reasoning flaws through expert hints and solver feedback, enabling a two-stage training pipeline that yields autonomous mastery. The resulting STORM model, a 4B LRM, achieves state-of-the-art macro-average accuracy (68.9%) across five benchmarks, matching larger models and demonstrating the efficacy and scalability of hint-based, computation-driven reasoning. The approach offers a practical, scalable path to expert-level optimization modeling with significantly fewer parameters than prior large-model baselines.

Abstract

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional \textit{non-reflective} datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose \textbf{CALM} (\textit{Corrective Adaptation with Lightweight Modification}), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop \textbf{STORM} (\textit{Smart Thinking Optimization Reasoning Model}), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

Paper Structure

This paper contains 43 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Performance vs. Model Size landscape for optimization modeling.
  • Figure 2: Illustrations of optimization modeling and reasoning paradigms.
  • Figure 3: Trigger Categorization and Distribution. The left (1) shows the macro-average frequency of each trigger, the first 6 triggers grouped into two primary categories. The right (2a and 2b) detail the frequency distribution of these two main categories across the evaluated benchmarks.
  • Figure 4: A representative example of Lack of OR Expertise flaw. (1) The model's native reasoning results in an incorrect problem formulation, leading to a wrong answer. (2) In contrast, the process under CLAM's guidance correct the formulation, enabling the model to find the correct solution.
  • Figure 5: Ablation study of our two-stage framework.
  • ...and 6 more figures