Risk-Averse Reinforcement Learning with Itakura-Saito Loss

Igor Udovichenko; Olivier Croissant; Anita Toleutaeva; Evgeny Burnaev; Alexander Korotin

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

Igor Udovichenko, Olivier Croissant, Anita Toleutaeva, Evgeny Burnaev, Alexander Korotin

TL;DR

The paper tackles risk-averse reinforcement learning by leveraging exponential utility and introduces a numerically stable Itakura-Saito loss for learning risk-sensitive value functions. Grounded in Bregman divergence, the loss yields a risk-averse Bellman update and a stochastic approximation rule, enabling stable training where prior exponential-style losses struggle. Empirical results across analytically tractable portfolio problems, Deep Hedging, and robust combinatorial optimization show that Itakura-Saito loss often outperforms alternatives in stability and accuracy, especially at larger risk aversion levels. The work also offers a theoretical perspective linking the loss to conformal invariance and improved optimization conditioning, suggesting broad implications for robust learning under uncertainty.

Abstract

Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where one can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. To address this, we introduce to the broad machine learning community a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate the Itakura-Saito loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple scenarios, some with known analytical solutions, and show that the considered loss function outperforms the alternatives.

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

TL;DR

Abstract

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)