Efficient Duple Perturbation Robustness in Low-rank MDPs
Yang Hu, Haitong Ma, Bo Dai, Na Li
TL;DR
Robust RL is challenged by distributional shifts in high-dimensional state-action spaces. The authors tackle this by introducing a duple perturbation framework using $(\xi,\eta)$-rectangular ambiguity, which couples perturbations in both feature maps and dynamics factors within low-rank MDPs. They propose R2PG, a representation-robust policy gradient method that solves robust policy evaluation via an SDP-reduced optimization and updates policies with a Natural Policy Gradient, accompanied by a convergence guarantee and a bounded suboptimality gap. Theoretical results establish a quasi-contraction property and a robust performance-difference bound, while experiments on toy models and an inverted pendulum demonstrate improved worst-case performance under perturbations. This approach yields scalable, theory-backed robust RL that leverages low-rank structure and function approximation for large or continuous spaces.
Abstract
The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from efficiency issues that obstruct their real-world implementation. In this paper, we introduce duple perturbation robustness, i.e. perturbation on both the feature and factor vectors for low-rank Markov decision processes (MDPs), via a novel characterization of $(ξ,η)$-ambiguity sets. The novel robust MDP formulation is compatible with the function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Examples are designed to justify the new robustness concept, and algorithmic efficiency is supported by both theoretical bounds and numerical simulations.
