Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Zhengfei Zhang; Kishan Panaganti; Laixi Shi; Yanan Sui; Adam Wierman; Yisong Yue

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue

TL;DR

This work develops an algorithmic framework based on strong duality that enables the first efficient and provable solution in a class of environmental uncertainties and exposes an inherent structure of DRC-RL that arises from the combination of distributional robustness and constraints.

Abstract

We study the problem of Distributionally Robust Constrained RL (DRC-RL), where the goal is to maximize the expected reward subject to environmental distribution shifts and constraints. This setting captures situations where training and testing environments differ, and policies must satisfy constraints motivated by safety or limited budgets. Despite significant progress toward algorithm design for the separate problems of distributionally robust RL and constrained RL, there do not yet exist algorithms with end-to-end convergence guarantees for DRC-RL. We develop an algorithmic framework based on strong duality that enables the first efficient and provable solution in a class of environmental uncertainties. Further, our framework exposes an inherent structure of DRC-RL that arises from the combination of distributional robustness and constraints, which prevents a popular class of iterative methods from tractably solving DRC-RL, despite such frameworks being applicable for each of distributionally robust RL and constrained RL individually. Finally, we conduct experiments on a car racing benchmark to evaluate the effectiveness of the proposed algorithm.

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

TL;DR

Abstract

Paper Structure (37 sections, 14 theorems, 68 equations, 4 figures, 2 algorithms)

This paper contains 37 sections, 14 theorems, 68 equations, 4 figures, 2 algorithms.

Introduction
Preliminaries and Problem Formulation
Robust Markov Decision Process.
Distributionally Robust Constrained RL (DRC-RL).
DRC-RL with General Uncertainty Sets
A Meta Algorithm for DRC-RL
The Online-Algorithm and Best-response Subroutines
DRC-RL with R-Contamination Uncertainty Sets
On the Intractability of Greedy Policies for DRC-RL
Experiments and Evaluation
Related Works
Conclusion
Acknowledgment
Proof for Section \ref{['sec:framework']}: DRC-RL with General Uncertainty Sets
Proof of Proposition \ref{['policy_convexify']}
...and 22 more sections

Key Result

Proposition 3.1

When substituting $\Pi$ with its convex hull $Conv(\Pi)$ in the DRC-RL problem (conservative_form), strong duality holds if Slater's condition holds.

Figures (4)

Figure 1: The four bar graphs denote the constraints satisfaction (green means satisfied) when shifts of power, inertia, braking magnitude, and steering angle occur. The lower right figure indicates the value of the objective (higher is better) when the steering angle is shifted. All evaluations are based on the value function (accumulated rewards) of mixture policy $\hat{\pi}$.
Figure 2: A two states, two actions Markov decision process used in example \ref{['example']} : the left and the right figures present the transition probabilities for actions $a_0$ and $a_1$.
Figure 3: Car Racing environment
Figure 4: Full results with Four different shifts: Higher is better, Left two are constraints and the right one is the objective. The bar graphs of constraints satisfaction are directly produced from these results.

Theorems & Definitions (24)

Proposition 3.1
Proposition 3.2
Proposition 3.3
Theorem 3.5
Definition 5.1: Greedy Policy Enabling
Definition 5.2: Operator Linearity
Lemma 5.3
Theorem 5.4
Corollary 5.5
Proposition A.1: Lemma 2, scherrer2015approximate
...and 14 more

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

TL;DR

Abstract

Distributionally Robust Constrained Reinforcement Learning under Strong Duality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (24)