Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay
Jiaqi Hu, Jie Qi, Jing Zhang
TL;DR
This work tackles the challenging problem of controlling an unstable first-order hyperbolic PDE with spatially-varying delays. It proposes NO-SAC, a hybrid framework that embeds a learned backstepping controller via DeepONet into a Soft Actor-Critic reinforcement learning agent, enabling control without the stringent delay variation assumption. Key contributions include learning a backstepping prior as a neural-operator and using it to warm-start both actor and critic networks, which yields faster convergence and eliminates steady-state error compared to vanilla SAC and standard backstepping under the delay assumption. The results demonstrate improved transient performance and generalization for distributed-parameter systems with spatially varying delays, offering a practical path toward robust RL-based PDE control with theoretical priors. The methodology highlights the effectiveness of neural operators in extracting high-dimensional features from PDE states to inform control policies.
Abstract
Control of distributed parameter systems affected by delays is a challenging task, particularly when the delays depend on spatial variables. The idea of integrating analytical control theory with learning-based control within a unified control scheme is becoming increasingly promising and advantageous. In this paper, we address the problem of controlling an unstable first-order hyperbolic PDE with spatially-varying delays by combining PDE backstepping control strategies and deep reinforcement learning (RL). To eliminate the assumption on the delay function required for the backstepping design, we propose a soft actor-critic (SAC) architecture incorporating a DeepONet to approximate the backstepping controller. The DeepONet extracts features from the backstepping controller and feeds them into the policy network. In simulations, our algorithm outperforms the baseline SAC without prior backstepping knowledge and the analytical controller.
