ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

Junbo Jacob Lian; Yujun Sun; Huiling Chen; Chaoyu Zhang; Chung-Piaw Teo

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

Junbo Jacob Lian, Yujun Sun, Huiling Chen, Chaoyu Zhang, Chung-Piaw Teo

TL;DR

This work introduces ReLoop, addressing silent failures from two complementary directions, structured generation dominates on complex compositional problems, while behavioral verification becomes the largest single contributor on problems with localized formulation defects.

Abstract

Large language models (LLMs) can translate natural language into optimization code, but silent failures pose a critical risk: code that executes and returns solver-feasible solutions may encode semantically incorrect formulations, creating a feasibility-correctness gap of up to 90 percentage points on compositional problems. We introduce ReLoop, addressing silent failures from two complementary directions. Structured generation decomposes code production into a four-stage reasoning chain (understand, formalize, synthesize, verify) that mirrors expert modeling practice, with explicit variable-type reasoning and self-verification to prevent formulation errors at their source. Behavioral verification detects errors that survive generation by testing whether the formulation responds correctly to solver-based parameter perturbation, without requiring ground truth -- an external semantic signal that bypasses the self-consistency problem inherent in LLM-based code review. The two mechanisms are complementary: structured generation dominates on complex compositional problems, while behavioral verification becomes the largest single contributor on problems with localized formulation defects. Together with execution recovery via IIS-enhanced diagnostics, ReLoop raises correctness from 22.6% to 31.1% and execution from 72.1% to 100.0% on the strongest model, with consistent gains across five models spanning three paradigms (foundation, SFT, RL) and three benchmarks. We additionally release RetailOpt-190, 190 compositional retail optimization scenarios targeting the multi-constraint interactions where LLMs most frequently fail.

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

TL;DR

Abstract

Paper Structure (112 sections, 13 equations, 1 figure, 23 tables, 1 algorithm)

This paper contains 112 sections, 13 equations, 1 figure, 23 tables, 1 algorithm.

Introduction
Related Work
LLM-Based Optimization Modeling.
Verification and Self-Correction.
Sensitivity Analysis and Program Verification.
Benchmarks.
Method
Problem Statement
Structured Generation
Stage 1 (Understand).
Stage 2 (Formalize).
Stage 3 (Synthesize).
Stage 4 (Verify Completeness).
Two-Layer Behavioral Verification
L1: Execution Verification (Blocking)
...and 97 more sections

Figures (1)

Figure 1: ReLoop overview. Structured Generation mirrors expert modeling practice: understand the problem, formalize the mathematical model with explicit variable-type reasoning, synthesize Gurobi code with data extraction, and self-verify completeness. Behavioral Verification: L1 checks execution correctness (Fatal blocks output); L2 tests constraint (CPT) and objective (OPT) presence via solver-based perturbation (Warning/Pass). Diagnosis-Guided Repair: Fatal triggers IIS-guided regeneration; Warning triggers targeted repair with regression rollback. After budget $N$, ReLoop returns the best verified code.

Theorems & Definitions (2)

Definition 1: Semantic Correctness
Definition 2: Silent Failure

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

TL;DR

Abstract

ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (2)