Temporal Parallelisation of the HJB Equation and Continuous-Time Linear Quadratic Control

Simo Särkkä; Ángel F. García-Fernández

Temporal Parallelisation of the HJB Equation and Continuous-Time Linear Quadratic Control

Simo Särkkä, Ángel F. García-Fernández

TL;DR

This paper introduces a temporal parallelisation framework for solving continuous-time optimal control problems by partitioning the time horizon and solving conditional value functions on each sub-interval. An associative operator enables parallel combination of these conditional solutions, yielding an overall $O(\log T)$ span for the HJB solution and enabling parallel recovery of optimal trajectories. The approach is specialised to the continuous-time LQT problem, where closed-form backward and forward conditional HJB equations are derived and used with block-wise scans to achieve fast, scalable solutions on multi-core CPUs and GPUs. Numerical experiments demonstrate substantial speedups over sequential methods, while discussing storage requirements and extensions to stochastic settings. The framework paves the way for scalable offline computation of optimal controls in high-dimensional, time-critical applications.

Abstract

This paper presents a mathematical formulation to perform temporal parallelisation of continuous-time optimal control problems, which can be solved via the Hamilton--Jacobi--Bellman (HJB) equation. We divide the time interval of the control problem into sub-intervals, and define a control problem in each sub-interval, conditioned on the start and end states, leading to conditional value functions for the sub-intervals. By defining an associative operator as the minimisation of the sum of conditional value functions, we obtain the elements and associative operators for a parallel associative scan operation. This allows for solving the optimal control problem on the whole time interval in parallel in logarithmic time complexity in the number of sub-intervals. We derive the HJB-type of backward and forward equations for the conditional value functions and solve them in closed form for linear quadratic problems. We also discuss numerical methods for computing the conditional value functions. The computational advantages of the proposed parallel methods are demonstrated via simulations run on a multi-core central processing unit and a graphics processing unit.

Temporal Parallelisation of the HJB Equation and Continuous-Time Linear Quadratic Control

TL;DR

span for the HJB solution and enabling parallel recovery of optimal trajectories. The approach is specialised to the continuous-time LQT problem, where closed-form backward and forward conditional HJB equations are derived and used with block-wise scans to achieve fast, scalable solutions on multi-core CPUs and GPUs. Numerical experiments demonstrate substantial speedups over sequential methods, while discussing storage requirements and extensions to stochastic settings. The framework paves the way for scalable offline computation of optimal controls in high-dimensional, time-critical applications.

Abstract

Paper Structure (35 sections, 6 theorems, 90 equations, 10 figures)

This paper contains 35 sections, 6 theorems, 90 equations, 10 figures.

Introduction
Background
Continuous-time optimal control problem
Continuous-time LQT problem
Associative operators and parallel computing
Parallel formulation of continuous-time control problems
Conditional value functions and combination rule
Initialization of conditional value functions
Backward and forward conditional HJB equations
Parallel solution of optimal control
Optimal trajectory recovery
Method 1
Method 2
Parallelisation of continuous-time LQT
Conditional value functions and combination rule
...and 20 more sections

Key Result

Theorem 4

Given $s<\tau<t$, and for any $x, y \in \mathbb{R}^{n_{x}}$, the conditional value function from time $s$ to time $t$ satisfies Furthermore we have

Figures (10)

Figure 1: Illustration of the basic idea. Instead of obtaining the value function from the HJB equation on a (sequential) backward pass, we first solve the conditional value functions on sub-intervals and then combine them in parallel using a parallel associative scan.
Figure 2: Illustration of the up-sweep of the parallel scan algorithm to compute the all-prefix-sums of $a_{1:T} = [ 1, 2, 3, 4, 5, 6, 7, 8 ]$.
Figure 3: Illustration of the down-sweep of the parallel scan algorithm to compute the all-prefix-sums of $a_{1:T} = [ 1, 2, 3, 4, 5, 6, 7, 8 ]$. The results of the up-sweep algorithm are shown in parentheses, see Figure \ref{['fig:parallel_up']}. The bottom row shows the computation of the all-prefix-sums.
Figure 4: The reference trajectory of the parallel LQT experiment along with the optimal trajectory. The diversion of the optimal path from the reference trajectory in the middle is due to the starting point being slightly off the reference trajectory.
Figure 5: Results of parallel continuous-time LQT experiment ran on CPU. The parallel algorithms (prefix 'par') can be seen to be consistently faster than the sequential ones (prefix 'seq').
...and 5 more figures

Theorems & Definitions (11)

Example 1
Definition 2: Conditional value function
Definition 3
Theorem 4
Remark 5
Theorem 6
Lemma 7
Theorem 8
Theorem 9
Lemma 10
...and 1 more

Temporal Parallelisation of the HJB Equation and Continuous-Time Linear Quadratic Control

TL;DR

Abstract

Temporal Parallelisation of the HJB Equation and Continuous-Time Linear Quadratic Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)