Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

Fengjun Yang; Nikolai Matni

Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

Fengjun Yang, Nikolai Matni

TL;DR

Coordinating planning and tracking in layered control architectures is challenging due to mismatches between a learned planner and a low-level tracker. The authors derive a layered reformulation of a finite-horizon optimal-control problem and introduce a coordinating dual network that guides the planner to generate references that the tracker can reliably follow, all trained via actor-critic methods. They prove convergence of the dual map in the unconstrained LQR setting and demonstrate robust performance on constrained LQR and a nonlinear unicycle, underscoring both theoretical validity and practical applicability. The approach preserves interpretability and modularity, and shows improved tracking accuracy and constraint satisfaction by explicitly accounting for the tracker’s behavior during planning.

Abstract

We propose a reinforcement learning (RL)-based algorithm to jointly train (1) a trajectory planner and (2) a tracking controller in a layered control architecture. Our algorithm arises naturally from a rewrite of the underlying optimal control problem that lends itself to an actor-critic learning approach. By explicitly learning a \textit{dual} network to coordinate the interaction between the planning and tracking layers, we demonstrate the ability to achieve an effective consensus between the two components, leading to an interpretable policy. We theoretically prove that our algorithm converges to the optimal dual network in the Linear Quadratic Regulator (LQR) setting and empirically validate its applicability to nonlinear systems through simulation experiments on a unicycle model.

Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

TL;DR

Abstract

Paper Structure (33 sections, 9 theorems, 105 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 33 sections, 9 theorems, 105 equations, 3 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Layered control architectures
Hierarchical reinforcement learning
Actor-critic methods
Statement of Contributions
Problem Formulation
Layered Approach to Optimal Control
Actor-Critic Learning in the Layered Control Architecture
Parameterization of the Layered Policy
Learning the Tracking Controller via Actor-Critic Method
Learning the Dual Network
Summary of the Algorithm
Analysis for Linear Quadratic Regulator
With Optimal Tracking
...and 18 more sections

Key Result

Lemma 1

Given the update rules eq:opt-traj-opt-update, eq:opt-tracking-update, the difference between the updates $r^{(k)}_i$ and $x^{(k)}_i$ can be written as a linear map of the initial condition $\xi$ as where $H$ and $G$ are matrices of appropriate dimensions that depend on $A, B, Q, R$, and $H$ is symmetric negative definite. See Lemma lem:rx-expression in Appendix sec:appendix-proof-opt for definit

Figures (3)

Figure 1: Comparison of trajectory planning and tracking approaches. (a) Previous approaches integrate a trajectory planner and a low-level controller by feeding the reference trajectory $r$ generated by the planner directly into the low-level controller. The low-level tracking controller minimizes the tracking cost, while the planner minimizes both the tracking cost and the nominal cost $\mathcal{C}(r)$. However, due to the tracking controller's imperfections, the executed trajectory often deviates from the reference, resulting in suboptimal performance. (b) Our proposed method introduces an additional dual network that learns to preemptively perturb the reference trajectory $r$ to $\tilde{r}$, accounting for the low-level controller's inaccuracies. By perturbing the reference trajectory, the executed trajectory $x$ is closer to the actual reference $r$, thus improving overall performance. We show that this module can be trained in the fashion of a dual update (hence the name) by observing the discrepancy between the reference and the executed trajectory.
Figure 2: Training progress for the dual map parameter $\Theta$. Here, the solid lines are the median over $15$ random LQR instances, and the shaded regions represent the $25^{th}$ to $75^{th}$ percentile.
Figure 3: A Representative Sample Trajectory for Constrained LQR.

Theorems & Definitions (15)

Lemma 1
Theorem 1
proof
Theorem 2
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
...and 5 more

Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

TL;DR

Abstract

Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)