Coordinating Planning and Tracking in Layered Control Policies via Actor-Critic Learning
Fengjun Yang, Nikolai Matni
TL;DR
Coordinating planning and tracking in layered control architectures is challenging due to mismatches between a learned planner and a low-level tracker. The authors derive a layered reformulation of a finite-horizon optimal-control problem and introduce a coordinating dual network that guides the planner to generate references that the tracker can reliably follow, all trained via actor-critic methods. They prove convergence of the dual map in the unconstrained LQR setting and demonstrate robust performance on constrained LQR and a nonlinear unicycle, underscoring both theoretical validity and practical applicability. The approach preserves interpretability and modularity, and shows improved tracking accuracy and constraint satisfaction by explicitly accounting for the tracker’s behavior during planning.
Abstract
We propose a reinforcement learning (RL)-based algorithm to jointly train (1) a trajectory planner and (2) a tracking controller in a layered control architecture. Our algorithm arises naturally from a rewrite of the underlying optimal control problem that lends itself to an actor-critic learning approach. By explicitly learning a \textit{dual} network to coordinate the interaction between the planning and tracking layers, we demonstrate the ability to achieve an effective consensus between the two components, leading to an interpretable policy. We theoretically prove that our algorithm converges to the optimal dual network in the Linear Quadratic Regulator (LQR) setting and empirically validate its applicability to nonlinear systems through simulation experiments on a unicycle model.
