RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs
Zhenao Song
TL;DR
This work tackles the sampling bottleneck in Physics-Informed Neural Networks (PINNs) by reimagining adaptive sampling as a reinforcement learning problem. It introduces RL-PINNs, which formulate point selection as a Markov decision process and use a Deep Q-Network to learn a sampling policy, guided by a gradient-free function-variation reward and a delayed reward mechanism to emphasize long-term training stability. The method enables single-round, scalable adaptive sampling that avoids expensive gradient computations, and the authors validate it across low-regular, nonlinear, high-dimensional, and high-order PDEs, consistently outperforming residual-based methods in accuracy with negligible sampling overhead. These results suggest that RL-PINNs offer a practical and scalable path to more accurate PINN solutions in complex, high-dimensional settings and beyond.
Abstract
Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs). However, their performance heavily relies on the strategy used to select training points. Conventional adaptive sampling methods, such as residual-based refinement, often require multi-round sampling and repeated retraining of PINNs, leading to computational inefficiency due to redundant points and costly gradient computations-particularly in high-dimensional or high-order derivative scenarios. To address these limitations, we propose RL-PINNs, a reinforcement learning(RL)-driven adaptive sampling framework that enables efficient training with only a single round of sampling. Our approach formulates adaptive sampling as a Markov decision process, where an RL agent dynamically selects optimal training points by maximizing a long-term utility metric. Critically, we replace gradient-dependent residual metrics with a computationally efficient function variation as the reward signal, eliminating the overhead of derivative calculations. Furthermore, we employ a delayed reward mechanism to prioritize long-term training stability over short-term gains. Extensive experiments across diverse PDE benchmarks, including low-regular, nonlinear, high-dimensional, and high-order problems, demonstrate that RL-PINNs significantly outperforms existing residual-driven adaptive methods in accuracy. Notably, RL-PINNs achieve this with negligible sampling overhead, making them scalable to high-dimensional and high-order problems.
