Table of Contents
Fetching ...

HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control

Alex Durkin, Jasper Stolte, Mehmet Mercangöz

TL;DR

This work introduces HOFLON, a Hybrid Offline Learning + Online Optimization framework for process start-up and grade-transition control. HOFLON offline-trains a long-horizon Q-critic and a data-manifold model via an Adversarial Autoencoder, then online resolves a constrained optimization to maximize the learned value while constraining actions to the data manifold and penalizing rapid changes. The approach is validated on two industrial benchmarks (polymerization start-up and paper-machine grade change) where HOFLON outperforms Implicit Q-learning and, on average, exceeds the best historically observed transitions in cumulative reward. The results demonstrate HOFLON’s ability to automate complex transitions beyond current expert capability with transparent trade-offs between performance, safety, and actuator smoothness, and with real-time computational feasibility.

Abstract

Start-ups and product grade-changes are critical steps in continuous-process plant operation, because any misstep immediately affects product quality and drives operational losses. These transitions have long relied on manual operation by a handful of expert operators, but the progressive retirement of that workforce is leaving plant owners without the tacit know-how needed to execute them consistently. In the absence of a process model, offline reinforcement learning (RL) promises to capture and even surpass human expertise by mining historical start-up and grade-change logs, yet standard offline RL struggles with distribution shift and value-overestimation whenever a learned policy ventures outside the data envelope. We introduce HOFLON (Hybrid Offline Learning + Online Optimization) to overcome those limitations. Offline, HOFLON learns (i) a latent data manifold that represents the feasible region spanned by past transitions and (ii) a long-horizon Q-critic that predicts the cumulative reward from state-action pairs. Online, it solves a one-step optimization problem that maximizes the Q-critic while penalizing deviations from the learned manifold and excessive rates of change in the manipulated variables. We test HOFLON on two industrial case studies: a polymerization reactor start-up and a paper-machine grade-change problem, and benchmark it against Implicit Q-Learning (IQL), a leading offline-RL algorithm. In both plants HOFLON not only surpasses IQL but also delivers, on average, better cumulative rewards than the best start-up or grade-change observed in the historical data, demonstrating its potential to automate transition operations beyond current expert capability.

HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control

TL;DR

This work introduces HOFLON, a Hybrid Offline Learning + Online Optimization framework for process start-up and grade-transition control. HOFLON offline-trains a long-horizon Q-critic and a data-manifold model via an Adversarial Autoencoder, then online resolves a constrained optimization to maximize the learned value while constraining actions to the data manifold and penalizing rapid changes. The approach is validated on two industrial benchmarks (polymerization start-up and paper-machine grade change) where HOFLON outperforms Implicit Q-learning and, on average, exceeds the best historically observed transitions in cumulative reward. The results demonstrate HOFLON’s ability to automate complex transitions beyond current expert capability with transparent trade-offs between performance, safety, and actuator smoothness, and with real-time computational feasibility.

Abstract

Start-ups and product grade-changes are critical steps in continuous-process plant operation, because any misstep immediately affects product quality and drives operational losses. These transitions have long relied on manual operation by a handful of expert operators, but the progressive retirement of that workforce is leaving plant owners without the tacit know-how needed to execute them consistently. In the absence of a process model, offline reinforcement learning (RL) promises to capture and even surpass human expertise by mining historical start-up and grade-change logs, yet standard offline RL struggles with distribution shift and value-overestimation whenever a learned policy ventures outside the data envelope. We introduce HOFLON (Hybrid Offline Learning + Online Optimization) to overcome those limitations. Offline, HOFLON learns (i) a latent data manifold that represents the feasible region spanned by past transitions and (ii) a long-horizon Q-critic that predicts the cumulative reward from state-action pairs. Online, it solves a one-step optimization problem that maximizes the Q-critic while penalizing deviations from the learned manifold and excessive rates of change in the manipulated variables. We test HOFLON on two industrial case studies: a polymerization reactor start-up and a paper-machine grade-change problem, and benchmark it against Implicit Q-Learning (IQL), a leading offline-RL algorithm. In both plants HOFLON not only surpasses IQL but also delivers, on average, better cumulative rewards than the best start-up or grade-change observed in the historical data, demonstrating its potential to automate transition operations beyond current expert capability.

Paper Structure

This paper contains 75 sections, 30 equations, 15 figures, 10 tables, 1 algorithm.

Figures (15)

  • Figure 1: Conceptual representation of the elements in the HOFLON-RL algorithm.
  • Figure 2: Block diagram representation of the online elements in the HOFLON-RL algorithm.
  • Figure 3: Schematic of the polymerization CSTR environment. Control actions include the initiator feed rate ($F_{\text{M,in}}$) and cooling jacket temperature $T_C$. The reactor states include monomer concentration ($C_M$), initiator concentration ($C_I$), radical concentration ($C_R$), polymer concentration ($C_P$), and reactor temperature ($T$), where the latter two are also the controlled variables.
  • Figure 4: Schematic of the paper machine indicating the manipulated variables (or actions) and the observed process outputs.
  • Figure 5: 100 PI controlled episodes from the polymerization CSTR start-up scenario.
  • ...and 10 more figures