Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

Yiming Meng; Ruikun Zhou; Amartya Mukherjee; Maxwell Fitzsimmons; Christopher Song; Jun Liu

Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

Yiming Meng, Ruikun Zhou, Amartya Mukherjee, Maxwell Fitzsimmons, Christopher Song, Jun Liu

TL;DR

This work develops physics-informed neural policy iteration (PI) to solve nonlinear optimal control problems by driving policy evaluation through GHJB/HJB formulations. It introduces two neural variants, ELM-PI for low-dimensional problems via linear least-squares PDE solving and PINN-PI for high-dimensional settings via physics-informed networks, both with convergence guarantees to viscosity solutions. To ensure safety, the authors formulate a formal stability verification framework using neural Lyapunov functions and delta-complete SMT solvers, illustrating that naive training can yield unstable controllers without verification. Theoretical results establish convergence of exact-PI and neural-PI to the true optimal solution under appropriate conditions, while numerical experiments demonstrate favorable scalability and stability across synthetic, inverted-pendulum, and RL-like benchmarks, with PINN-PI outperforming several RL baselines in higher dimensions. Overall, the paper provides a principled, verifiable neural policy-iteration paradigm that blends PDE-based control theory with modern neural approximators to tackle high-dimensional nonlinear optimal control problems.

Abstract

Solving nonlinear optimal control problems is a challenging task, particularly for high-dimensional problems. We propose algorithms for model-based policy iterations to solve nonlinear optimal control problems with convergence guarantees. The main component of our approach is an iterative procedure that utilizes neural approximations to solve linear partial differential equations (PDEs), ensuring convergence. We present two variants of the algorithms. The first variant formulates the optimization problem as a linear least square problem, drawing inspiration from extreme learning machine (ELM) for solving PDEs. This variant efficiently handles low-dimensional problems with high accuracy. The second variant is based on a physics-informed neural network (PINN) for solving PDEs and has the potential to address high-dimensional problems. We demonstrate that both algorithms outperform traditional approaches, such as Galerkin methods, by a significant margin. We provide a theoretical analysis of both algorithms in terms of convergence of neural approximations towards the true optimal solutions in a general setting. Furthermore, we employ formal verification techniques to demonstrate the verifiable stability of the resulting controllers.

Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

TL;DR

Abstract

Paper Structure (34 sections, 73 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 34 sections, 73 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Problem formulation
Algorithms
Exact policy iteration
ELM-PI via linear least squares
PINN-PI via physics-informed neural network
Loss term to ensure local stability is preserved across iterations
Verification of stability via neural Lyapunov functions
Convergence analysis
Convergence analysis for exact-PI
Convergence analysis for policy iteration using neural approximations
Numerical experiments
Synthetic $n$-dimensional nonlinear control
Inverted pendulum and comparison with successive Galerkin approximations
Comparison with reinforcement learning algorithms
...and 19 more sections

Figures (8)

Figure 1: ELM-PI on inverted pendulum: despite visual similarity and apparent convergence, the controller obtained from $m=50$ fails to stabilize the system, while the one from $m=100$ can be verified to be stabilizing using an SMT solver.
Figure 2: ELM-PI, PINN-PI, and SGA on the inverted pendulum example: it can be seen that the value returned by a high-order SGA achieves the same cost as ELM-PI with a different number of neurons, while the computational time required by ELM-PI is significantly less. In all the cases, we are able to verify the Lyapunov stability conditions outlined in Section \ref{['sec:lyap']} are met.
Figure 3: Certified regions of attraction by ELM-PI, PINN-PI, and SGA on the inverted pendulum example: it can be seen that for high-order SGA, PINN-PI, and ELM-PI, a region of attraction close to the boundary of the region of interest $\Omega$ can be verified using SMT solvers.
Figure 4: Plots of accumulated costs over time for the four environments
Figure 5: Plots of trajectories starting from different initial conditions under the optimal controller learned using PINN-PI for the four environments. All trajectories converge to the origin.
...and 3 more figures

Theorems & Definitions (1)

proof

Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

TL;DR

Abstract

Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (8)

Theorems & Definitions (1)