Wasserstein Robust Reinforcement Learning

Mohammed Amin Abdullah; Hang Ren; Haitham Bou Ammar; Vladimir Milenkovic; Rui Luo; Mingtian Zhang; Jun Wang

Wasserstein Robust Reinforcement Learning

Mohammed Amin Abdullah, Hang Ren, Haitham Bou Ammar, Vladimir Milenkovic, Rui Luo, Mingtian Zhang, Jun Wang

TL;DR

This work tackles the challenge of overfitting and poor generalisation in reinforcement learning by introducing WR$^2$L, a Wasserstein-robust RL framework that seeks the best policy under worst-case but bounded transition dynamics. The core idea is to constrain admissible dynamics within an $\epsilon$-Wasserstein ball around a reference model $\mathcal{P}_0$ and to solve a min–max objective over policy parameters $\theta$ and dynamics parameters $\phi$. A key novelty is the alternating descent-ascent optimisation, with a second-order Taylor approximation of the Wasserstein distance yielding a closed-form update for $\phi$ and a gradient-based update for $\theta$, plus a zero-order method to estimate gradients and Hessians when dynamics are treated as black-box simulators. Empirically, WR$^2$L demonstrates superior robustness compared to standard and prior robust RL approaches on MuJoCo benchmarks, including high-dimensional variations, highlighting its practical impact for real-world, uncertain environments. The work also provides a scalable zero-order solver and analytic results for the Hessian-based constraint handling, offering a general tool for robust optimisation in dynamical systems.

Abstract

Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes $\text{W}\text{R}^{2}\text{L}$ -- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.

Wasserstein Robust Reinforcement Learning

TL;DR

This work tackles the challenge of overfitting and poor generalisation in reinforcement learning by introducing WR

L, a Wasserstein-robust RL framework that seeks the best policy under worst-case but bounded transition dynamics. The core idea is to constrain admissible dynamics within an

-Wasserstein ball around a reference model

and to solve a min–max objective over policy parameters

and dynamics parameters

. A key novelty is the alternating descent-ascent optimisation, with a second-order Taylor approximation of the Wasserstein distance yielding a closed-form update for

and a gradient-based update for

, plus a zero-order method to estimate gradients and Hessians when dynamics are treated as black-box simulators. Empirically, WR

L demonstrates superior robustness compared to standard and prior robust RL approaches on MuJoCo benchmarks, including high-dimensional variations, highlighting its practical impact for real-world, uncertain environments. The work also provides a scalable zero-order solver and analytic results for the Hessian-based constraint handling, offering a general tool for robust optimisation in dynamical systems.

Abstract

Reinforcement learning algorithms, though successful, tend to over-fit to training environments hampering their application to the real-world. This paper proposes

-- a robust reinforcement learning algorithm with significant robust performance on low and high-dimensional control tasks. Our method formalises robust reinforcement learning as a novel min-max game with a Wasserstein constraint for a correct and convergent solver. Apart from the formulation, we also propose an efficient and scalable solver following a novel zero-order optimisation method that we believe can be useful to numerical optimisation in general. We empirically demonstrate significant gains compared to standard and robust state-of-the-art algorithms on high-dimensional MuJuCo environments.

Wasserstein Robust Reinforcement Learning

TL;DR

Abstract

Wasserstein Robust Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)