Table of Contents
Fetching ...

Zero-Shot Transferable Solution Method for Parametric Optimal Control Problems

Xingjian Li, Kelvin Kan, Deepanshu Verma, Krishna Kumar, Stanley Osher, Ján Drgoňa

TL;DR

Numerical experiments show the proposed transferable solution method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.

Abstract

This paper presents a transferable solution method for optimal control problems with varying objectives using function encoder (FE) policies. Traditional optimization-based approaches must be re-solved whenever objectives change, resulting in prohibitive computational costs for applications requiring frequent evaluation and adaptation. The proposed method learns a reusable set of neural basis functions that spans the control policy space, enabling efficient zero-shot adaptation to new tasks through either projection from data or direct mapping from problem specifications. The key idea is an offline-online decomposition: basis functions are learned once during offline imitation learning, while online adaptation requires only lightweight coefficient estimation. Numerical experiments across diverse dynamics, dimensions, and cost structures show our method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.

Zero-Shot Transferable Solution Method for Parametric Optimal Control Problems

TL;DR

Numerical experiments show the proposed transferable solution method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.

Abstract

This paper presents a transferable solution method for optimal control problems with varying objectives using function encoder (FE) policies. Traditional optimization-based approaches must be re-solved whenever objectives change, resulting in prohibitive computational costs for applications requiring frequent evaluation and adaptation. The proposed method learns a reusable set of neural basis functions that spans the control policy space, enabling efficient zero-shot adaptation to new tasks through either projection from data or direct mapping from problem specifications. The key idea is an offline-online decomposition: basis functions are learned once during offline imitation learning, while online adaptation requires only lightweight coefficient estimation. Numerical experiments across diverse dynamics, dimensions, and cost structures show our method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.

Paper Structure

This paper contains 11 sections, 2 theorems, 36 equations, 6 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Let $K \subset \mathbb{R}^n$ be compact and let $\mathcal{H} = \{ h : K \to \mathbb{R}^n \mid \|h\|_{\mathcal{H}} < \infty \}$ be a separable Hilbert space. For any continuous $h \in \mathcal{H}$ and any $\epsilon > 0$, there exist neural network basis functions $\{\phi_1, \phi_2, \dots\}$, some po

Figures (6)

  • Figure 1: Function encoder policies: the online–offline decoupling enables efficient and accurate policy adaptation to different optimal control objectives.
  • Figure 2: Generalization results for 2D trajectory planning. All cases test on new initial states, demonstrating semi-global policies that work on both seen and unseen target scenarios.
  • Figure 3: Visualization of the learned control policy. We generate a sample of size $256$ following the specified distribution for the initial state. The plot illustrates how they traverse over time following the learned control, demonstrating consistent performance across the state space.
  • Figure 4: Generalization results for the quadcopter path planning problem. Visualization for a new target $\mathbf{y} = [1.5, 3.5, 1.5, 0, \dots, 0]^\top$ not seen during training. We compare model predictions to corresponding true solutions.
  • Figure 5: Visualization of the three worst performing scenarios of the learned controller tested over new problem settings for the single obstacle example. Top: Predicted solutions over different initial states. Middle: Ground truth solution calculated w.r.t. the same initial states. Bottom: Visualization of the controls, here for clarity, we only show a few instances.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1: Universal Function Space Approximation ingebrand2025functionencodersprincipledapproach
  • Theorem 2