Efficient Duple Perturbation Robustness in Low-rank MDPs

Yang Hu; Haitong Ma; Bo Dai; Na Li

Efficient Duple Perturbation Robustness in Low-rank MDPs

Yang Hu, Haitong Ma, Bo Dai, Na Li

TL;DR

Robust RL is challenged by distributional shifts in high-dimensional state-action spaces. The authors tackle this by introducing a duple perturbation framework using $(\xi,\eta)$-rectangular ambiguity, which couples perturbations in both feature maps and dynamics factors within low-rank MDPs. They propose R2PG, a representation-robust policy gradient method that solves robust policy evaluation via an SDP-reduced optimization and updates policies with a Natural Policy Gradient, accompanied by a convergence guarantee and a bounded suboptimality gap. Theoretical results establish a quasi-contraction property and a robust performance-difference bound, while experiments on toy models and an inverted pendulum demonstrate improved worst-case performance under perturbations. This approach yields scalable, theory-backed robust RL that leverages low-rank structure and function approximation for large or continuous spaces.

Abstract

The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from efficiency issues that obstruct their real-world implementation. In this paper, we introduce duple perturbation robustness, i.e. perturbation on both the feature and factor vectors for low-rank Markov decision processes (MDPs), via a novel characterization of $(ξ,η)$-ambiguity sets. The novel robust MDP formulation is compatible with the function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Examples are designed to justify the new robustness concept, and algorithmic efficiency is supported by both theoretical bounds and numerical simulations.

Efficient Duple Perturbation Robustness in Low-rank MDPs

TL;DR

Robust RL is challenged by distributional shifts in high-dimensional state-action spaces. The authors tackle this by introducing a duple perturbation framework using

-rectangular ambiguity, which couples perturbations in both feature maps and dynamics factors within low-rank MDPs. They propose R2PG, a representation-robust policy gradient method that solves robust policy evaluation via an SDP-reduced optimization and updates policies with a Natural Policy Gradient, accompanied by a convergence guarantee and a bounded suboptimality gap. Theoretical results establish a quasi-contraction property and a robust performance-difference bound, while experiments on toy models and an inverted pendulum demonstrate improved worst-case performance under perturbations. This approach yields scalable, theory-backed robust RL that leverages low-rank structure and function approximation for large or continuous spaces.

Abstract

-ambiguity sets. The novel robust MDP formulation is compatible with the function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Examples are designed to justify the new robustness concept, and algorithmic efficiency is supported by both theoretical bounds and numerical simulations.

Paper Structure (47 sections, 16 theorems, 74 equations, 8 figures, 1 algorithm)

This paper contains 47 sections, 16 theorems, 74 equations, 8 figures, 1 algorithm.

Introduction
Related Work
Robust MDPs and robust RL.
MDPs with linear/low-rank representations.
Robust RL with function approximation.
Preliminaries
Notations.
Markov Decision Processes (MDPs).
Low-rank MDPs.
Robust Low-Rank MDP with Dual Perturbation and $\bm{(\bm{\xi}, \bm{\eta})}$-Rectangularity
The Challenges of Robustness in Low-Rank MDPs
The Proposed Robust Low-Rank MDP
Rationale of the Proposed Low-rank Robustness with $\bm{(\bm{\xi}, \bm{\eta})}$-Rectangularity
Relationship with Nominal and Standard Robust Updates.
Robustness Induced by Low-rank Robust MDPs.
...and 32 more sections

Key Result

Theorem 3.2

Suppose the $(\xi, \eta)$-ambiguity set induced by $\mathcal{M}$ is a subset of $\widehat{\mathcal{M}}$. Then for any step $h \in [H]$ we have:

Figures (8)

Figure 1: MDP diagram for the string guessing game.
Figure 2: MDP diagram for the gamble-or-guarantee game.
Figure 3: Numerical simulations in a toy model.
Figure 4: MDP diagram for the string guessing game.
Figure 5: MDP diagram for the gamble-or-guarantee game.
...and 3 more figures

Theorems & Definitions (31)

Remark 3.1
Theorem 3.2
Example 3.1: string guessing
Example 3.2: gamble-or-guarantee
Theorem 4.1: Reduction
Theorem 5.1: Convergence
Lemma 5.2
Lemma 5.3
Remark A.1
Lemma A.2
...and 21 more

Efficient Duple Perturbation Robustness in Low-rank MDPs

TL;DR

Abstract

Efficient Duple Perturbation Robustness in Low-rank MDPs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (31)