RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel
TL;DR
The paper addresses the challenge of risk-averse decision-making in MDPs under both epistemic (model) and aleatory (random dynamics) uncertainty. It introduces RASR, a framework combining risk-averse objectives with soft-robustness and analyzes two risk measures, ERM and EVaR. The authors derive a novel dynamic-programming formulation for RASR-ERM with time-varying risk and prove the existence of deterministic time-dependent optimal policies, plus a reduction to the mean-posterior model under certain conditions. For EVaR, they show how to reduce to multiple ERM problems and provide a grid-search algorithm with performance guarantees. Empirical results across multiple domains demonstrate that RASR-EVaR yields strong risk mitigation compared to baselines and maintains computational practicality, highlighting the framework’s potential for safer, more reliable RL in real-world settings.
Abstract
Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.
