Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Maximilian Bloor, Max Mowbray, Ehecatl Antonio Del Rio Chanona, Calvin Tsay
TL;DR
The paper addresses the challenge of applying reinforcement learning to Process Systems Engineering by providing a tutorial-style synthesis of methods (model-free, model-based, constrained, offline, distributional, and goal-conditioned) and surveying PSE applications across batch control, regulation, scheduling, and supply chains. It formalizes RL within the MDP/POMDP framework, discusses exact DP limits, and then details a spectrum of RL techniques (from bandits to actor-critic, with emphasis on practicality for high-dimensional, safety-critical industrial settings). Key contributions include clarifying how to integrate RL with control theory (e.g., MPC, CMDP, barrier functions), highlighting sample efficiency and safety as central concerns, and illustrating successful or promising applications in batch processes, regulatory control, and supply chain management. The paper thus guides researchers and practitioners toward data-efficient, safe, and domain-informed RL deployments in PSE, and outlines future directions such as offline RL, model-based planning, distributional risk-aware objectives, and goal-conditioned policies to handle multi-setpoint operations.
Abstract
Sequential decision making under uncertainty is central to many Process Systems Engineering (PSE) challenges, where traditional methods often face limitations related to controlling and optimizing complex and stochastic systems. Reinforcement Learning (RL) offers a data-driven approach to derive control policies for such challenges. This paper presents a survey and tutorial on RL methods, tailored for the PSE community. We deliver a tutorial on RL, covering fundamental concepts and key algorithmic families including value-based, policy-based and actor-critic methods. Subsequently, we survey existing applications of these RL techniques across various PSE domains, such as in fed-batch and continuous process control, process optimization, and supply chains. We conclude with PSE focused discussion of specialized techniques and emerging directions. By synthesizing the current state of RL algorithm development and implications for PSE this work identifies successes, challenges, trends, and outlines avenues for future research at the interface of these fields.
