Table of Contents
Fetching ...

OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents

Ruicheng Ao, David Simchi-Levi, Xinshang Wang

TL;DR

OptiRepair splits this task into a domain-agnostic feasibility phase (iterative IIS-guided repair of any LP) and a domain-specific validation phase (five rationality checks grounded in inventory theory) and trains two 8B-parameter models using self-taught reasoning with solver-verified rewards.

Abstract

Problem Definition. Supply chain optimization models frequently become infeasible because of modeling errors. Diagnosis and repair require scarce OR expertise: analysts must interpret solver diagnostics, trace root causes across echelons, and fix formulations without sacrificing operational soundness. Whether AI agents can perform this task remains untested. Methodology/Results. OptiRepair splits this task into a domain-agnostic feasibility phase (iterative IIS-guided repair of any LP) and a domain-specific validation phase (five rationality checks grounded in inventory theory). We test 22 API models from 7 families on 976 multi-echelon supply chain problems and train two 8B-parameter models using self-taught reasoning with solver-verified rewards. The trained models reach 81.7% Rational Recovery Rate (RRR) -- the fraction of problems resolved to both feasibility and operational rationality -- versus 42.2% for the best API model and 21.3% on average. The gap concentrates in Phase 1 repair: API models average 27.6% recovery rate versus 97.2% for trained models. Managerial Implications. Two gaps separate current AI from reliable model repair: solver interaction (API models restore only 27.6% of infeasible formulations) and operational rationale (roughly one in four feasible repairs violate supply chain theory). Each requires a different intervention: solver interaction responds to targeted training; operational rationale requires explicit specification as solver-verifiable checks. For organizations adopting AI in operational planning, formalizing what "rational" means in their context is the higher-return investment.

OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents

TL;DR

OptiRepair splits this task into a domain-agnostic feasibility phase (iterative IIS-guided repair of any LP) and a domain-specific validation phase (five rationality checks grounded in inventory theory) and trains two 8B-parameter models using self-taught reasoning with solver-verified rewards.

Abstract

Problem Definition. Supply chain optimization models frequently become infeasible because of modeling errors. Diagnosis and repair require scarce OR expertise: analysts must interpret solver diagnostics, trace root causes across echelons, and fix formulations without sacrificing operational soundness. Whether AI agents can perform this task remains untested. Methodology/Results. OptiRepair splits this task into a domain-agnostic feasibility phase (iterative IIS-guided repair of any LP) and a domain-specific validation phase (five rationality checks grounded in inventory theory). We test 22 API models from 7 families on 976 multi-echelon supply chain problems and train two 8B-parameter models using self-taught reasoning with solver-verified rewards. The trained models reach 81.7% Rational Recovery Rate (RRR) -- the fraction of problems resolved to both feasibility and operational rationality -- versus 42.2% for the best API model and 21.3% on average. The gap concentrates in Phase 1 repair: API models average 27.6% recovery rate versus 97.2% for trained models. Managerial Implications. Two gaps separate current AI from reliable model repair: solver interaction (API models restore only 27.6% of infeasible formulations) and operational rationale (roughly one in four feasible repairs violate supply chain theory). Each requires a different intervention: solver interaction responds to targeted training; operational rationale requires explicit specification as solver-verifiable checks. For organizations adopting AI in operational planning, formalizing what "rational" means in their context is the higher-return investment.
Paper Structure (86 sections, 16 equations, 8 figures, 12 tables)

This paper contains 86 sections, 16 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Two-phase closed-loop architecture of OptiRepair. Phase I (domain-agnostic) iteratively diagnoses and repairs an infeasible LP via solver IIS feedback. Upon reaching Optimal, Phase II (domain-specific) validates operational correctness through a replaceable domain oracle---implemented here with five supply chain rationality checks.
  • Figure 2: Phase I repair interaction. The agent receives an infeasible supply chain LP (Gurobi), uses GET_IIS to diagnose conflicting constraints, and repairs the capacity bound. When the solver returns Optimal, the episode transitions to Phase II. Full system prompt and transcript appear in Appendix \ref{['sec:app_prompts']}.
  • Figure 3: Phase II rationality repair interaction. The rationality oracle detects that the Phase I repair created a cost-consistency violation: the high-cost retailer holds more inventory than the cheaper warehouse. The agent corrects the objective coefficient; the solver re-optimizes and the oracle confirms all five checks pass.
  • Figure 4: OptiRepair-SC construction pipeline. ① A multi-echelon supply chain generator produces valid LPs with 2--5 echelons and 12--24 periods. ② A saboteur injects one of ten error types, grouped into five categories (demand/timing, balance/capacity, cost/structure, coefficient/sign, constraint/index). ③ Gurobi confirms the sabotaged model is infeasible (or Optimal-but-irrational for ME-5). ④ The ground-truth fix is verified against five rationality checks. ⑤ Validated instances form OptiRepair-SC (976 problems, disjoint train/test splits).
  • Figure 5: OptiSTaR training pipeline. Top: Phase I iterates through beam search exploration ($K\!=\!32$ candidates), STaR distillation, and GRPO refinement; three iterations raise $\text{RR@}{5}$ from 44.4% to 75.7%. Gurobi provides IIS feedback for beam search and outcome rewards for GRPO. Middle right: Phase II trains independently on oracle-generated data (SFT) refined by solver-based GRPO. Bottom: The two independently trained 8B models combine into the final pipeline ($\text{RRR} = 81.7\%$).
  • ...and 3 more figures