RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Jia Lin Hau; Marek Petrik; Mohammad Ghavamzadeh; Reazul Russel

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

TL;DR

The paper addresses the challenge of risk-averse decision-making in MDPs under both epistemic (model) and aleatory (random dynamics) uncertainty. It introduces RASR, a framework combining risk-averse objectives with soft-robustness and analyzes two risk measures, ERM and EVaR. The authors derive a novel dynamic-programming formulation for RASR-ERM with time-varying risk and prove the existence of deterministic time-dependent optimal policies, plus a reduction to the mean-posterior model under certain conditions. For EVaR, they show how to reduce to multiple ERM problems and provide a grid-search algorithm with performance guarantees. Empirical results across multiple domains demonstrate that RASR-EVaR yields strong risk mitigation compared to baselines and maintains computational practicality, highlighting the framework’s potential for safer, more reliable RL in real-world settings.

Abstract

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

TL;DR

Abstract

Paper Structure (27 sections, 16 theorems, 88 equations, 2 figures, 6 tables, 3 algorithms)

This paper contains 27 sections, 16 theorems, 88 equations, 2 figures, 6 tables, 3 algorithms.

Introduction
Preliminaries
Risk-averse MDP
Soft-robust MDP
RASR
Risk Measures
RASR-ERM Framework
Dynamic Program Formulation for RASR-ERM
Algorithms for Optimizing RASR-ERM
RASR-EVaR Framework
Empirical Evaluation
Related Work
Conclusion and Future Work
Proofs of \ref{['sec:preliminaries']}
Proofs of \ref{['sec:rasr-erm']}
...and 12 more sections

Key Result

Theorem 1

Any two random variables $X_1,X_2\in\mathbb X$ satisfy that

Figures (2)

Figure 1: $\psi^{0.99}[\mathfrak{R}^{\pi}_{\infty}]$ in river-swim (left) and population (right) problems .
Figure 2: Plots of $h(\alpha)$ (left) and $h(\zeta^{-1})$ (right) for $\beta=0.5$, which are used in the proof of \ref{['prop:non-concave']}.

Theorems & Definitions (34)

Theorem 1: Tower Property
Theorem 2: Positive Quasi-homogeneity
Theorem 3: Bellman Equations
Corollary 4
Theorem 5
Theorem 6
Theorem 7
Corollary 8
Theorem 9
Proposition 10: Tower Property, e.g., Proposition 3.4 in Ross2007
...and 24 more

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

TL;DR

Abstract

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (34)