A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem
Filippos Fotiadis, Kyriakos G. Vamvoudakis
TL;DR
The paper addresses the challenge of solving the infinite-horizon optimal control problem for nonlinear systems by employing physics-informed neural networks (PINNs) on a finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. It proves that the finite-horizon value $V_T$ uniformly converges to the infinite-horizon value $V^*$ and that the corresponding policy $\mu_T$ converges to $\mu^*$ as the horizon length $T$ grows, while providing practical tools to verify sufficiency and extend the horizon with controlled computation and error propagation. The proposed horizon-aware PINN framework avoids the need for a known stabilizing policy or iterative policy evaluation and supports non-polynomial bases, enabling robust approximation on compact domains. Simulations on a torsional pendulum, a quartic-value-function system, and a third-order system demonstrate substantial gains with horizon extension and highlight advantages over directly solving the steady-state HJB.
Abstract
We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it -- with reduced computations and robustness to approximation errors -- in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.
