Table of Contents
Fetching ...

End-to-End Training of High-Dimensional Optimal Control with Implicit Hamiltonians via Jacobian-Free Backpropagation

Eric Gelphman, Deepanshu Verma, Nicole Tianjiao Yang, Stanley Osher, Samy Wu Fung

TL;DR

This work tackles high-dimensional optimal control with implicit Hamiltonians by proposing an end-to-end implicit deep learning approach that parameterizes the value function and derives the optimal controller from the Hamiltonian's optimality condition. An implicit neural network defines the fixed-point optimality, while Jacobian-Free Backpropagation (JFB) enables efficient differentiation through temporally coupled trajectories, reducing computational cost from cubic to quadratic in the control dimension. The authors extend JFB theory to the optimal control setting, establishing descent guarantees for the JFB gradient under contractivity and smoothness assumptions, and demonstrate scalability on quadrotor and high-dimensional bicycle dynamics where traditional methods struggle. The results show that JFB achieves comparable or better performance with substantially lower memory and compute than automatic differentiation or CVXPY-based approaches, enabling reliable learning of high-dimensional feedback laws for problems with implicit Hamiltonians. This framework broadens the applicability of value-function-based control to a wider class of practical systems.

Abstract

Neural network approaches that parameterize value functions have succeeded in approximating high-dimensional optimal feedback controllers when the Hamiltonian admits explicit formulas. However, many practical problems, such as the space shuttle reentry problem and bicycle dynamics, among others, may involve implicit Hamiltonians that do not admit explicit formulas, limiting the applicability of existing methods. Rather than directly parameterizing controls, which does not leverage the Hamiltonian's underlying structure, we propose an end-to-end implicit deep learning approach that directly parameterizes the value function to learn optimal control laws. Our method enforces physical principles by ensuring trained networks adhere to the control laws by exploiting the fundamental relationship between the optimal control and the value function's gradient; this is a direct consequence of the connection between Pontryagin's Maximum Principle and dynamic programming. Using Jacobian-Free Backpropagation (JFB), we achieve efficient training despite temporal coupling in trajectory optimization. We show that JFB produces descent directions for the optimal control objective and experimentally demonstrate that our approach effectively learns high-dimensional feedback controllers across multiple scenarios involving implicit Hamiltonians, which existing methods cannot address.

End-to-End Training of High-Dimensional Optimal Control with Implicit Hamiltonians via Jacobian-Free Backpropagation

TL;DR

This work tackles high-dimensional optimal control with implicit Hamiltonians by proposing an end-to-end implicit deep learning approach that parameterizes the value function and derives the optimal controller from the Hamiltonian's optimality condition. An implicit neural network defines the fixed-point optimality, while Jacobian-Free Backpropagation (JFB) enables efficient differentiation through temporally coupled trajectories, reducing computational cost from cubic to quadratic in the control dimension. The authors extend JFB theory to the optimal control setting, establishing descent guarantees for the JFB gradient under contractivity and smoothness assumptions, and demonstrate scalability on quadrotor and high-dimensional bicycle dynamics where traditional methods struggle. The results show that JFB achieves comparable or better performance with substantially lower memory and compute than automatic differentiation or CVXPY-based approaches, enabling reliable learning of high-dimensional feedback laws for problems with implicit Hamiltonians. This framework broadens the applicability of value-function-based control to a wider class of practical systems.

Abstract

Neural network approaches that parameterize value functions have succeeded in approximating high-dimensional optimal feedback controllers when the Hamiltonian admits explicit formulas. However, many practical problems, such as the space shuttle reentry problem and bicycle dynamics, among others, may involve implicit Hamiltonians that do not admit explicit formulas, limiting the applicability of existing methods. Rather than directly parameterizing controls, which does not leverage the Hamiltonian's underlying structure, we propose an end-to-end implicit deep learning approach that directly parameterizes the value function to learn optimal control laws. Our method enforces physical principles by ensuring trained networks adhere to the control laws by exploiting the fundamental relationship between the optimal control and the value function's gradient; this is a direct consequence of the connection between Pontryagin's Maximum Principle and dynamic programming. Using Jacobian-Free Backpropagation (JFB), we achieve efficient training despite temporal coupling in trajectory optimization. We show that JFB produces descent directions for the optimal control objective and experimentally demonstrate that our approach effectively learns high-dimensional feedback controllers across multiple scenarios involving implicit Hamiltonians, which existing methods cannot address.

Paper Structure

This paper contains 14 sections, 5 theorems, 32 equations, 4 figures.

Key Result

Lemma 1

$\left( M_\theta M_\theta^\top \right)^{-1}$ has uniform upper and lower bounds on the eigenvalues for all $t,z,u,\theta$. That is, $\exists$ positive constants $0< \lambda_- < \lambda_+$, such that $\lambda_{-} \: I \preceq \left( M_\theta M_\theta^\top \right)^{-1} \preceq \lambda_+ \: I$, for all

Figures (4)

  • Figure 1: Comparison of JFB, automatic differentiation (AD), and CVXPYLayersagrawal2019differentiable (Implicit Differentiation) for training the value function (and hence, feedback controller) for a quadrotor across three metrics. (Top Left) Loss versus training epochs. (Top Right) Loss plotted against cumulative runtime in minutes. (Bottom Left) Loss plotted against cumulative work units, with one work unit being one evaluation of $\frac{\partial T_{\theta}}{\partial \theta}$, which is equivalent to backpropagation through one application of $T_\theta$. (Bottom Right) Maximum GPU memory usage per training epoch.
  • Figure 2: Comparison of JFB and automatic differentiation (AD) for training a feedback for 5 bicycles across four metrics. (Top Left) Loss versus training epochs. (Top Right) Loss plotted against cumulative runtime in minutes. (Bottom Left) Loss plotted against cumulative work units. (Bottom Right) Maximum GPU memory usage per training epoch.
  • Figure 3: Results for high-dimensional 20-bicycle problem using JFB. (Top Left) Loss vs. training epochs. (Top Right) Loss vs. runtime in minutes. (Bottom Left) Loss vs. cumulative work units, (Bottom Right) Maximum GPU memory usage per training epoch. AD cannot be employed due to high memory requirements of backpropagating through each application of $T_\theta$.
  • Figure 4: Trajectories for an instance of the 20-bicycle problem. This high-dimensional problem causes memory issues with automatic differentiation (AD), while CVXPYLayers cannot be applied due to non-convex dynamics.

Theorems & Definitions (10)

  • Lemma 1
  • proof
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof
  • proof