Table of Contents
Fetching ...

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

TL;DR

The paper addresses the challenge of risk-averse decision-making in MDPs under both epistemic (model) and aleatory (random dynamics) uncertainty. It introduces RASR, a framework combining risk-averse objectives with soft-robustness and analyzes two risk measures, ERM and EVaR. The authors derive a novel dynamic-programming formulation for RASR-ERM with time-varying risk and prove the existence of deterministic time-dependent optimal policies, plus a reduction to the mean-posterior model under certain conditions. For EVaR, they show how to reduce to multiple ERM problems and provide a grid-search algorithm with performance guarantees. Empirical results across multiple domains demonstrate that RASR-EVaR yields strong risk mitigation compared to baselines and maintains computational practicality, highlighting the framework’s potential for safer, more reliable RL in real-world settings.

Abstract

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

TL;DR

The paper addresses the challenge of risk-averse decision-making in MDPs under both epistemic (model) and aleatory (random dynamics) uncertainty. It introduces RASR, a framework combining risk-averse objectives with soft-robustness and analyzes two risk measures, ERM and EVaR. The authors derive a novel dynamic-programming formulation for RASR-ERM with time-varying risk and prove the existence of deterministic time-dependent optimal policies, plus a reduction to the mean-posterior model under certain conditions. For EVaR, they show how to reduce to multiple ERM problems and provide a grid-search algorithm with performance guarantees. Empirical results across multiple domains demonstrate that RASR-EVaR yields strong risk mitigation compared to baselines and maintains computational practicality, highlighting the framework’s potential for safer, more reliable RL in real-world settings.

Abstract

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.
Paper Structure (27 sections, 16 theorems, 88 equations, 2 figures, 6 tables, 3 algorithms)

This paper contains 27 sections, 16 theorems, 88 equations, 2 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

Any two random variables $X_1,X_2\in\mathbb X$ satisfy that

Figures (2)

  • Figure 1: $\psi^{0.99}[\mathfrak{R}^{\pi}_{\infty}]$ in river-swim (left) and population (right) problems .
  • Figure 2: Plots of $h(\alpha)$ (left) and $h(\zeta^{-1})$ (right) for $\beta=0.5$, which are used in the proof of \ref{['prop:non-concave']}.

Theorems & Definitions (34)

  • Theorem 1: Tower Property
  • Theorem 2: Positive Quasi-homogeneity
  • Theorem 3: Bellman Equations
  • Corollary 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Corollary 8
  • Theorem 9
  • Proposition 10: Tower Property, e.g., Proposition 3.4 in Ross2007
  • ...and 24 more