Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence
Feiran Zhao, Keyou You
TL;DR
The paper tackles constrained CMDPs in continuous control by formulating a cost-constrained LQR with multiple safety-like constraints and solving it via a policy-gradient primal-dual method. It proves strong duality and shows the dual is differentiable with a Lipschitz-smooth gradient, enabling provable convergence guarantees for the primal-dual updates. Theoretical results establish sublinear convergence of the dual regret with a bias depending on primal accuracy, and simulations on a 2D UAV double-integrator validate the approach and constraint satisfaction. This work extends rigorous PG analysis to continuous control with multiple unbounded costs and lays groundwork for data-driven, sample-based extensions.
Abstract
In safety-critical applications, reinforcement learning (RL) needs to consider safety constraints. However, theoretical understandings of constrained RL for continuous control are largely absent. As a case study, this paper presents a cost-constrained LQR formulation, where a number of LQR costs with user-defined penalty matrices are subject to constraints. To solve it, we propose a policy gradient primal-dual method to find an optimal state feedback gain. Despite the non-convexity of the cost-constrained LQR problem, we provide a constructive proof for strong duality and a geometric interpretation of an optimal multiplier set. By proving that the concave dual function is Lipschitz smooth, we further provide convergence guarantees for the PG primal-dual method. Finally, we perform simulations to validate our theoretical findings.
