Table of Contents
Fetching ...

A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem

Filippos Fotiadis, Kyriakos G. Vamvoudakis

TL;DR

The paper addresses the challenge of solving the infinite-horizon optimal control problem for nonlinear systems by employing physics-informed neural networks (PINNs) on a finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. It proves that the finite-horizon value $V_T$ uniformly converges to the infinite-horizon value $V^*$ and that the corresponding policy $\mu_T$ converges to $\mu^*$ as the horizon length $T$ grows, while providing practical tools to verify sufficiency and extend the horizon with controlled computation and error propagation. The proposed horizon-aware PINN framework avoids the need for a known stabilizing policy or iterative policy evaluation and supports non-polynomial bases, enabling robust approximation on compact domains. Simulations on a torsional pendulum, a quartic-value-function system, and a third-order system demonstrate substantial gains with horizon extension and highlight advantages over directly solving the steady-state HJB.

Abstract

We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it -- with reduced computations and robustness to approximation errors -- in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.

A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem

TL;DR

The paper addresses the challenge of solving the infinite-horizon optimal control problem for nonlinear systems by employing physics-informed neural networks (PINNs) on a finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. It proves that the finite-horizon value uniformly converges to the infinite-horizon value and that the corresponding policy converges to as the horizon length grows, while providing practical tools to verify sufficiency and extend the horizon with controlled computation and error propagation. The proposed horizon-aware PINN framework avoids the need for a known stabilizing policy or iterative policy evaluation and supports non-polynomial bases, enabling robust approximation on compact domains. Simulations on a torsional pendulum, a quartic-value-function system, and a third-order system demonstrate substantial gains with horizon extension and highlight advantages over directly solving the steady-state HJB.

Abstract

We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems. In particular, since PINNs are generally able to solve a class of partial differential equations (PDEs), they can be employed to learn the value function of the infinite-horizon optimal control problem via solving the associated steady-state Hamilton-Jacobi-Bellman (HJB) equation. However, an issue here is that the steady-state HJB equation generally yields multiple solutions; hence if PINNs are directly employed to it, they may end up approximating a solution that is different from the optimal value function of the problem. We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution, and which uniformly approximates the optimal value function as the horizon increases. An algorithm to verify if the chosen horizon is large enough is also given, as well as a method to extend it -- with reduced computations and robustness to approximation errors -- in case it is not. Unlike many existing methods, the proposed technique works well with non-polynomial basis functions, does not require prior knowledge of a stabilizing controller, and does not perform iterative policy evaluations. Simulations are performed, which verify and clarify theoretical findings.

Paper Structure

This paper contains 16 sections, 8 theorems, 54 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

lemma 1

Assume that $\phi\equiv0$. Then, for all $x\in\mathbb{R}^n$, the sequence $V_{T}(x,0)$ is increasing with respect to $T$ and upper bounded by $V^\star(x)$, i.e., for every real $T_2\ge T_1 >0$, it holds that: In addition, $\lim_{T\rightarrow\infty}V_T(x,0)=L(x)$ for some Lebesgue measurable function $L:\mathbb{R}^n\rightarrow\mathbb{R}$.

Figures (7)

  • Figure 1: The learnt value function $\hat{V}_{1}(\cdot,0)$ (left), the learnt control policy $\hat{\mu}_{1}(\cdot,0)$ (middle), and the squared flow residual $E_e(x;\hat{V}_{1}(x,0))^2$ (right), for $T=1$ for the pendulum.
  • Figure 2: The learnt value function $\hat{V}_{4}(\cdot,0)$ (left), the learnt control policy $\hat{\mu}_{4}(\cdot,0)$ (middle), and the squared flow residual $E_e(x;\hat{V}_{4}(x,0))^2$ (right), for the pendulum.
  • Figure 3: The learnt value function $\hat{V}(\cdot)$ (left), the learnt control policy $\hat{\mu}(\cdot)$ (middle), and the squared flow residual $E_e(x;\hat{V}(x))^2$ (right), for the pendulum.
  • Figure 4: The learnt value function $\hat{V}_{0.5}(\cdot,0)$ (left) and the error from optimality $\hat{V}_{0.5}(\cdot,0)-V^\star(\cdot)$ (right), for $T=0.5$.
  • Figure 5: The learnt control policy $\hat{\mu}_{0.5}(\cdot,0)$ (left) and the error from optimality $\hat{\mu}_{0.5}(\cdot,0)-\mu^\star(\cdot)$ (right), for $T=0.5$.
  • ...and 2 more figures

Theorems & Definitions (22)

  • definition 1
  • remark 1
  • remark 2
  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • proof
  • lemma 4
  • ...and 12 more