On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

Joar Skalse; Alessandro Abate

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

Joar Skalse, Alessandro Abate

TL;DR

This work analyzes the expressivity limits of scalar, Markovian rewards in reinforcement learning across three broad task classes: multi-objective MORL, risk-sensitive RL, and modal RL. It proves that scalar rewards can reproduce MORL preferences only under linear weighting of component rewards, and that risk-sensitive objectives and most modal tasks cannot be captured by Markovian rewards due to non-affine transformations and environment-dependent modalities, respectively. The authors also discuss bespoke algorithms and outline approaches to solving problems within each class, as well as practical implications for reward design, reward learning, and the need for alternative formulations like MORL. Overall, the paper clarifies when standard reward structures suffice and highlights the necessity of extended frameworks to express a wider range of useful, real-world objectives.

Abstract

In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed using a scalar, Markovian reward. Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes. We thereby contribute to a more complete understanding of what standard reward functions can and cannot express. In addition to this, we also call attention to modal problems as a new class of problems, since they have so far not been given any systematic treatment in the RL literature. We also briefly outline some approaches for solving some of the problems we discuss, by means of bespoke RL algorithms.

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

TL;DR

Abstract

Paper Structure (12 sections, 22 theorems, 11 equations)

This paper contains 12 sections, 22 theorems, 11 equations.

Introduction
Preliminaries
Multi-Objective Problems
Risk-Sensitive Problems
Modal Problems
Solving Tasks That Are Inexpressible by Markovian Rewards
Discussion
Related Work
Proofs
Tasks as Optimal Policies
More MORL Objectives
A Method for Solving Modal Tasks

Key Result

Theorem 1

If a MOMDP $\mathcal{M}$ with objective ${\mathcal{O}}$ is scalarizable, then there exist $w_1 \dots w_k \in \mathbb{R}$ such that $\mathcal{M}$ with ${\mathcal{O}}$ is scalarized by the reward $R(s,a) = \sum_{i=1}^k w_i \cdot R_i(s,a)$.

Theorems & Definitions (44)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Theorem 1
Corollary 1
Corollary 2
Corollary 3
...and 34 more

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

TL;DR

Abstract

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (44)