End-to-End Training of High-Dimensional Optimal Control with Implicit Hamiltonians via Jacobian-Free Backpropagation
Eric Gelphman, Deepanshu Verma, Nicole Tianjiao Yang, Stanley Osher, Samy Wu Fung
TL;DR
This work tackles high-dimensional optimal control with implicit Hamiltonians by proposing an end-to-end implicit deep learning approach that parameterizes the value function and derives the optimal controller from the Hamiltonian's optimality condition. An implicit neural network defines the fixed-point optimality, while Jacobian-Free Backpropagation (JFB) enables efficient differentiation through temporally coupled trajectories, reducing computational cost from cubic to quadratic in the control dimension. The authors extend JFB theory to the optimal control setting, establishing descent guarantees for the JFB gradient under contractivity and smoothness assumptions, and demonstrate scalability on quadrotor and high-dimensional bicycle dynamics where traditional methods struggle. The results show that JFB achieves comparable or better performance with substantially lower memory and compute than automatic differentiation or CVXPY-based approaches, enabling reliable learning of high-dimensional feedback laws for problems with implicit Hamiltonians. This framework broadens the applicability of value-function-based control to a wider class of practical systems.
Abstract
Neural network approaches that parameterize value functions have succeeded in approximating high-dimensional optimal feedback controllers when the Hamiltonian admits explicit formulas. However, many practical problems, such as the space shuttle reentry problem and bicycle dynamics, among others, may involve implicit Hamiltonians that do not admit explicit formulas, limiting the applicability of existing methods. Rather than directly parameterizing controls, which does not leverage the Hamiltonian's underlying structure, we propose an end-to-end implicit deep learning approach that directly parameterizes the value function to learn optimal control laws. Our method enforces physical principles by ensuring trained networks adhere to the control laws by exploiting the fundamental relationship between the optimal control and the value function's gradient; this is a direct consequence of the connection between Pontryagin's Maximum Principle and dynamic programming. Using Jacobian-Free Backpropagation (JFB), we achieve efficient training despite temporal coupling in trajectory optimization. We show that JFB produces descent directions for the optimal control objective and experimentally demonstrate that our approach effectively learns high-dimensional feedback controllers across multiple scenarios involving implicit Hamiltonians, which existing methods cannot address.
