Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Feiran Zhao; Keyou You

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Feiran Zhao, Keyou You

TL;DR

The paper tackles constrained CMDPs in continuous control by formulating a cost-constrained LQR with multiple safety-like constraints and solving it via a policy-gradient primal-dual method. It proves strong duality and shows the dual is differentiable with a Lipschitz-smooth gradient, enabling provable convergence guarantees for the primal-dual updates. Theoretical results establish sublinear convergence of the dual regret with a bias depending on primal accuracy, and simulations on a 2D UAV double-integrator validate the approach and constraint satisfaction. This work extends rigorous PG analysis to continuous control with multiple unbounded costs and lays groundwork for data-driven, sample-based extensions.

Abstract

In safety-critical applications, reinforcement learning (RL) needs to consider safety constraints. However, theoretical understandings of constrained RL for continuous control are largely absent. As a case study, this paper presents a cost-constrained LQR formulation, where a number of LQR costs with user-defined penalty matrices are subject to constraints. To solve it, we propose a policy gradient primal-dual method to find an optimal state feedback gain. Despite the non-convexity of the cost-constrained LQR problem, we provide a constructive proof for strong duality and a geometric interpretation of an optimal multiplier set. By proving that the concave dual function is Lipschitz smooth, we further provide convergence guarantees for the PG primal-dual method. Finally, we perform simulations to validate our theoretical findings.

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

TL;DR

Abstract

Paper Structure (17 sections, 11 theorems, 67 equations, 3 figures)

This paper contains 17 sections, 11 theorems, 67 equations, 3 figures.

Introduction
Problem formulation
Policy gradient primal-dual methods for the cost-constrained LQR
The policy gradient primal-dual method to solve \ref{['prob:clqr']}
Strong duality between the primal problem (\ref{['prob:clqr']}) and the dual problem \ref{['prob:dual']}
Convergence of the PG primal-dual method for the cost-constrained LQR
Properties of the dual function
Convergence of the PG primal-dual method
Simulations
Simulation model
Convergence of the PG primal-dual method
Conclusion
Proof of Lemma \ref{['lem:continuity']}
Proof of Lemma \ref{['lem:sublipschitz']}
Proof in Section \ref{['subsec:conver']}
...and 2 more sections

Key Result

Lemma 1

The unique minimizer of the Lagrangian $K_{\lambda}^*$ and the constrained costs $J_i(K_{\lambda}^*), \forall i \in \{1,2,\cdots, N\}$ are continuous in $\lambda$ over $\mathbb{R}^N_+$.

Figures (3)

Figure 1: A geometry interpretation for optimal multipliers in a two-constraint example, where $y$ is some positive constant.
Figure 2: Convergence of the dual iteration in the PG primal-dual method.
Figure 3: Optimality gap and constraint violation of the PG primal-dual method.

Theorems & Definitions (11)

Lemma 1
Lemma 2
Theorem 1: Strong duality
Lemma 3: Differentiability of the dual function
Lemma 4: Local Lipschitz smoothness
Lemma 5
Theorem 2: Global convergence
Lemma 6
Lemma 7
Lemma 8
...and 1 more

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

TL;DR

Abstract

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)