Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning
Juan Sebastian Rojas, Chi-Guhn Lee
TL;DR
The paper formalizes risk-aware decision-making for continual reinforcement learning by identifying incompatibilities between classical risk measures and lifelong learning. It introduces ergodic risk measures, defined via finite-history computability and non-time-consistency, and proves their compatibility with continual learning under mild ergodicity-like assumptions. An ergodic RL objective based on average-reward MDPs is proposed, with two axioms (Feasibility and Plasticity) ensuring practical risk assessment in streaming data. A CVaR-based case study demonstrates adaptive, risk-sensitive behavior under changing risk attitudes and environments, illustrating the practical value of the framework for risk-aware lifelong agents.
Abstract
Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures that are compatible with continual learning. Finally, we provide a case study of risk-aware continual learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.
