Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Luning Sun, Xin-Yang Liu, Siyan Zhao, Aditya Grover, Jian-Xun Wang, Jayaraman J. Thiagarajan
TL;DR
This work tackles the problem of controlling instabilities in complex dynamical systems where high-fidelity simulations are prohibitively expensive. It introduces a multi-fidelity reinforcement learning framework that uses a differentiable hybrid environment, with learnable correction terms to bridge low- and high-fidelity models, and employs a spectrum-based reward to steer learning. Policy optimization is performed online via an actor-critic TD3 algorithm, enhanced by Stochastic Weight Averaging to improve generalization. The framework is validated on plasma instabilities (SRS) and Burgers turbulence, demonstrating that the statistics of the MFRL-controlled outcomes closely match those from many HF evaluations and that the approach outperforms several baselines, offering a data-efficient route to physics-informed control. The work has potential practical impact in accelerating robust control design for complex physical systems while reducing reliance on expensive HF simulations.
Abstract
Controlling instabilities in complex dynamical systems is challenging in scientific and engineering applications. Deep reinforcement learning (DRL) has seen promising results for applications in different scientific applications. The many-query nature of control tasks requires multiple interactions with real environments of the underlying physics. However, it is usually sparse to collect from the experiments or expensive to simulate for complex dynamics. Alternatively, controlling surrogate modeling could mitigate the computational cost issue. However, a fast and accurate learning-based model by offline training makes it very hard to get accurate pointwise dynamics when the dynamics are chaotic. To bridge this gap, the current work proposes a multi-fidelity reinforcement learning (MFRL) framework that leverages differentiable hybrid models for control tasks, where a physics-based hybrid model is corrected by limited high-fidelity data. We also proposed a spectrum-based reward function for RL learning. The effect of the proposed framework is demonstrated on two complex dynamics in physics. The statistics of the MFRL control result match that computed from many-query evaluations of the high-fidelity environments and outperform other SOTA baselines.
