cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

Jiawei Wang; Arshiya Taj Abdul; Evangelos A. Theodorou

cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

Jiawei Wang, Arshiya Taj Abdul, Evangelos A. Theodorou

TL;DR

The CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework is proposed by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA.

Abstract

Robust trajectory optimization enables autonomous systems to operate safely under uncertainty by computing control policies that satisfy the constraints for all bounded disturbances. However, these problems often lead to large Second Order Conic Programming (SOCP) constraints, which are computationally expensive. In this work, we propose the CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA. The first architecture, NRTO-DR, leverages the Douglas-Rachford (DR) splitting method to solve the SOCP inner subproblems of NRTO, thereby significantly reducing the computational burden through parallel SOCP projections and sparse direct solves. The second architecture, NRTO-FullADMM, is a novel variant that further exploits the problem structure to improve scalability using the Alternating Direction Method of Multipliers (ADMM). Finally, we provide GPU implementation of the proposed methodologies using custom CUDA kernels for SOC projection steps and cuBLAS GEMM chains for feedback gain updates. We validate the performance of cuNRTO through simulated experiments on unicycle, quadcopter, and Franka manipulator models, demonstrating speedup up to 139.6$\times$.

cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

TL;DR

Abstract

Paper Structure (51 sections, 55 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 51 sections, 55 equations, 10 figures, 8 tables, 1 algorithm.

Introduction
Notations
Organization of the paper
Nonlinear Robust Trajectory Optimization Framework
Problem Statement
NRTO Algorithm
NRTO-LE Algorithm
NRTO-DR Framework
Framework
GPU implementation
NRTO-FullADMM Framework
Framework
Block-1 update
Block-2 update
Dual update
...and 36 more sections

Figures (10)

Figure 1: cuNRTO on a 7-DoF Franka manipulator: cuNRTO involves an outer successive linearization (SL) loop run on the host CPU, with an inner loop executed on the GPU. Compared to NRTO, cuNRTO achieves a 25.9$\times$ wall-clock speedup on this setting with 100% constraint satisfaction. The three small boxes show the final state under Monte Carlo rollouts.
Figure 2: Overview of NRTO Framework: A bi-level structure involving an outer successive linearization (SL) loop to generate tractable linearized problem, and an inner ADMM loop to solve the resulting linearized problem.
Figure 3: Three cases of projecting a point $(\hat{t}, \hat{{\bm y}})$ (red) onto a SOC, illustrated in the $(t,\lVert {\bm y} \rVert_2)$ plane. Left: if $\lVert \hat{{\bm y}} \rVert_2 \le \hat{t}$, the point is already feasible and remains unchanged after projection (green). Middle: if $\lVert \hat{{\bm y}} \rVert_2 > |\hat{t}|$, the point lies outside the cone and is projected onto the cone boundary, preserving the direction of $\hat{{\bm y}}$. Right: if $\lVert \hat{{\bm y}}\rVert_2 \le -\hat{t}$, the point lies in the opposite cone and the projection collapses to the origin.
Figure 4: GPU execution of one relaxed DR iteration: Each iteration involves an affine-set projection \ref{['DR update step 1']}, a reflection step \ref{['DR update step 2']}, and massively parallel second-order cone projections \ref{['DR update step 3']} that are separable across constraints. The right panel illustrates the GPU mapping: each SOCP constraint block is handled by one warp, the warp scheduler dispatches these warps across streaming multiprocessors, and the grey blocks depict the execution lanes (CUDA cores) that run the warp instructions, with shared memory cache supporting fast projection and vector updates.
Figure 5: cuNRTO pipeline for NRTO-FullADMM: Each outer SL iteration on the host CPU linearizes the problem, packs the SOCP and QP data, and uploads constants to the GPU once. The FullADMM inner loop runs entirely on-device: (i) batched affine evaluation forms per-constraint inputs, (ii) SOC projections are computed in parallel over $j$, (iii) Block-2 updates solve the QP and update ${\bm k}_v$ using prepacked operators, and (iv) dual updates and residual checks determine termination.
...and 5 more figures

Theorems & Definitions (1)

Remark 1

cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

TL;DR

Abstract

cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (1)