Table of Contents
Fetching ...

LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling

Nima H. Siboni, Seyedreza Kiamousavi, Emad Scharifi

Abstract

Industrial process control demands policies that are interpretable and auditable, requirements that black-box neural policies struggle to meet. We study an LLM-driven heuristic synthesis framework for hot steel rolling, in which a language model iteratively proposes and refines human-readable Python controllers using rich behavioral feedback from a physics-based simulator. The framework combines structured strategic ideation, executable code generation, and per-component feedback across diverse operating conditions to search over control logic for height reduction, interpass time, and rolling velocity. Our first contribution is an auditable controller-synthesis pipeline for industrial process control. The generated controllers are explicit programs accessible to expert review, and we pair them with an automated audit pipeline that formally verifies key safety and monotonicity properties for the best synthesized heuristic. Our second contribution is a principled budget allocation strategy for LLM-driven heuristic search: we show that Luby-style universal restarts -- originally developed for randomized algorithms -- transfer directly to this setting, eliminating the need for problem-specific budget tuning. A single 160-iteration Luby campaign approaches the hindsight-optimal budget allocation derived from 52 ad-hoc runs totalling 730 iterations.

LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling

Abstract

Industrial process control demands policies that are interpretable and auditable, requirements that black-box neural policies struggle to meet. We study an LLM-driven heuristic synthesis framework for hot steel rolling, in which a language model iteratively proposes and refines human-readable Python controllers using rich behavioral feedback from a physics-based simulator. The framework combines structured strategic ideation, executable code generation, and per-component feedback across diverse operating conditions to search over control logic for height reduction, interpass time, and rolling velocity. Our first contribution is an auditable controller-synthesis pipeline for industrial process control. The generated controllers are explicit programs accessible to expert review, and we pair them with an automated audit pipeline that formally verifies key safety and monotonicity properties for the best synthesized heuristic. Our second contribution is a principled budget allocation strategy for LLM-driven heuristic search: we show that Luby-style universal restarts -- originally developed for randomized algorithms -- transfer directly to this setting, eliminating the need for problem-specific budget tuning. A single 160-iteration Luby campaign approaches the hindsight-optimal budget allocation derived from 52 ad-hoc runs totalling 730 iterations.
Paper Structure (39 sections, 8 figures, 7 tables)

This paper contains 39 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Best-so-far reward across 52 independent runs. The red line shows the median, the dark band the IQR (25th--75th percentile), and the light band the 10th--90th percentile. Individual traces shown in light blue. Labels indicate how many runs reached each iteration.
  • Figure 2: Distribution of the iteration at which each run discovers its best heuristic. The 50th percentile is at iteration 4; the mean is at iteration 8.
  • Figure 3: Expected best reward (median) as a function of total iteration budget for uniform strategies ($L{=}1, 5, 10, 20$) and the optimal mixed allocation. Labels on the optimal mix curve show the allocation chosen. Diversified restarts consistently outperform both single long runs and many short ones.
  • Figure 4: Per-sub-run decomposition of an unseeded Luby campaign (unit $u{=}5$, 15 sub-runs, 160 total iterations). Hollow bars show the first-iteration reward; dots show all subsequent iterations. Labels mark the iteration that achieved each sub-run's best reward. Depth consistently improves over the restart baseline, but sub-run 8 shows that fresh restarts can also strike gold immediately.
  • Figure 5: Budget-matched comparison. The black curve shows the expected best reward (median) under the optimal mixed allocation, computed retrospectively from 52 ad-hoc runs (730 total iterations). The green staircase shows the actual cumulative best achieved by a single unseeded Luby campaign (160 iterations). The Luby schedule approaches the hindsight-optimal strategy without requiring prior data.
  • ...and 3 more figures