Table of Contents
Fetching ...

Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning

Juan Sebastian Rojas, Chi-Guhn Lee

TL;DR

The paper formalizes risk-aware decision-making for continual reinforcement learning by identifying incompatibilities between classical risk measures and lifelong learning. It introduces ergodic risk measures, defined via finite-history computability and non-time-consistency, and proves their compatibility with continual learning under mild ergodicity-like assumptions. An ergodic RL objective based on average-reward MDPs is proposed, with two axioms (Feasibility and Plasticity) ensuring practical risk assessment in streaming data. A CVaR-based case study demonstrates adaptive, risk-sensitive behavior under changing risk attitudes and environments, illustrating the practical value of the framework for risk-aware lifelong agents.

Abstract

Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures that are compatible with continual learning. Finally, we provide a case study of risk-aware continual learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.

Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning

TL;DR

The paper formalizes risk-aware decision-making for continual reinforcement learning by identifying incompatibilities between classical risk measures and lifelong learning. It introduces ergodic risk measures, defined via finite-history computability and non-time-consistency, and proves their compatibility with continual learning under mild ergodicity-like assumptions. An ergodic RL objective based on average-reward MDPs is proposed, with two axioms (Feasibility and Plasticity) ensuring practical risk assessment in streaming data. A CVaR-based case study demonstrates adaptive, risk-sensitive behavior under changing risk attitudes and environments, illustrating the practical value of the framework for risk-aware lifelong agents.

Abstract

Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures that are compatible with continual learning. Finally, we provide a case study of risk-aware continual learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.

Paper Structure

This paper contains 20 sections, 7 theorems, 10 equations, 4 figures, 1 algorithm.

Key Result

Proposition 4.1

Under the continual RL framework proposed in Abel2025-ez, the agent's goal (or objective) can be interpreted as wanting to leverage past observations from the environment, $O_{a:b}$, to find and output a sequence of actions, $A_{c:d}$, that will result in a future sequence of observations from the e

Figures (4)

  • Figure 1: A comparison of: a) the continual RL framework proposed in Abel2025-ez, and b) the risk-aware generalization of the framework proposed in this work.
  • Figure 2: A comparison between: a) static, b) nested, and c) ergodic risk measures in terms of which observations are needed to accurately compute the risk at a given time step. A shaded box indicates that the observation (as per the y-axis value) is needed to accurately compute the risk at that time step (x-axis value). Note that the nested risk measure depicted in this figure is assumed to be a Markov risk measure (see Definition \ref{['defn_markov']}).
  • Figure 3: Rolling percent of time that the agent stays in the blue world state as learning progresses in the $\tau$-RPBP task. A solid line denotes the mean percent of time spent in the blue world state, and the shaded region denotes a 95% confidence interval over 50 runs. As shown in the figure, the agent correctly learns to stay in the blue world state in the beginning, and then correctly changes its preference to the red world state once its risk attitude changes from risk-neutral to risk-averse.
  • Figure 4: Rolling reward CVaR as learning progresses in the $\mathcal{S}$-RPBP task. A solid line denotes the mean CVaR, and the shaded region denotes a 95% confidence interval over 10 runs. The blue and red dashed lines denote the reward CVaR of the blue and red world states, respectively. As shown in the figure, the agent is able to continually adapt and find the state with the better CVaR.

Theorems & Definitions (21)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.4
  • Lemma 4.5
  • proof
  • Lemma 4.6
  • proof
  • Definition 4.7
  • Lemma 4.8
  • proof
  • ...and 11 more