Imitation Learning with Safety and L2 Stability Certificates for Boundary Control of Reaction-Diffusion PDEs
Paulo Henrique Foganholo Biazetto, Mirko Fiacchini, Christophe Prieur, Gustavo Artur de Andrade
TL;DR
This work addresses stabilizing boundary-control of a 1D reaction–diffusion PDE by combining spectral (Sturm–Liouville) truncation with an imitation-learning pipeline that learns a continuous-time NN controller from an expert MPC. Stability is enforced via Lyapunov-based LMIs derived in conjunction with quadratic constraints that bound NN nonlinearities, ensuring an exponential decay rate $\delta>0$ and robustness to truncation spillover. The learning problem is solved with ADMM, jointly optimizing imitation loss and the size of a certified ROA, yielding an NN controller that emulates the expert while providing formal stability guarantees for the original infinite-dimensional system. The methodology yields a computationally efficient controller with a verifiable region of attraction and validated performance on an unstable PDE, highlighting the practical viability of guaranteed-safe IL for distributed-parameter systems.
Abstract
This paper proposes an imitation learning (IL) framework for synthesizing neural network (NN) controllers that achieve boundary stabilization of systems governed by reaction-diffusion partial differential equations (PDEs). The plant is assumed to be actuated through a Dirichlet boundary condition and subject to a Neumann condition on the unactuated side. The design is based on a finite-dimensional truncated model that captures the unstable dynamics of the original infinite-dimensional system, which is obtained via spectral decomposition. Convex stability and safety conditions are then derived for this truncated model by combining Lyapunov theory with local quadratic constraints (QC), which bound the nonlinear activation functions of the NN and guarantee robustness to model truncation, thus addressing the spillover problem. These conditions are integrated into the IL process to jointly minimize the imitation loss and maximize the volume of the certified region of attraction (ROA). The proposed framework is validated on an unstable reaction-diffusion PDE, demonstrating that the resulting NN controller efficiently reproduces the expert policy while ensuring formal stability guarantees.
