Reinforcement Learning with Exogenous States and Rewards

George Trimponias; Thomas G. Dietterich

Reinforcement Learning with Exogenous States and Rewards

George Trimponias, Thomas G. Dietterich

TL;DR

This work addresses the slow learning caused by exogenous variability in reinforcement learning by introducing a formal exogenous–endogenous decomposition of MDPs under additive reward structure. It shows that the endogenous MDP, coupled with the exogenous reward regression, yields a decomposed Bellman structure where optimizing the endogenous MDP suffices for optimal policy in the original MDP. The authors formalize exogeno-decomposition, prove the maximal exogenous subspace exists and is unique, and develop two practical algorithms, GRDS and SRAS, to discover the exogenous subspace when mixing is linear; they also propose the CCC surrogate for conditional mutual information. Empirical results across high-dimensional linear, nonlinear, discrete, and combinatorial-action MDPs demonstrate substantial speedups and, in many cases, end-to-end performance matching or exceeding an endogenous-reward oracle, with Simplified-GRDS often the most robust option. The work advances RL in environments with uncontrollable exogenous factors by providing a principled, online method to isolate and remove exogenous reward noise, accelerating learning in complex control tasks.

Abstract

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous subspace, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.

Reinforcement Learning with Exogenous States and Rewards

TL;DR

Abstract

Paper Structure (39 sections, 16 theorems, 49 equations, 17 figures, 7 tables, 4 algorithms)

This paper contains 39 sections, 16 theorems, 49 equations, 17 figures, 7 tables, 4 algorithms.

Introduction
Prior Work
Definitions and Structural Properties of Exogenous-State MDPs
Causally-Exogenous State Variables
Probabilistically-Exogenous State Variables
Additive Reward Decomposition
Decomposing a 2-Exogenous State MDP: Optimization Formulations
Variable Selection Formulation
Unmixing Formulation
Algorithms for Decomposing a 2-Exogenous State MDP into Exogenous and Endogenous Components
GRDS: Global Rank Descending Scheme
Analysis of the Global Rank-Descending Scheme
Stepwise Algorithm SRAS
Experimental Study
Experimental Details
...and 24 more sections

Key Result

Theorem 1

A state variable $S$ is causally exogenous if and only if it is action-disconnected.

Figures (17)

Figure 1: Restricted 2-Time Step Dynamic Bayesian Network sufficient to establish that $X$ is exogenous.
Figure 2: Unrolled state transition diagram for the full exo DBN.
Figure 3: State transition diagram of MDP with 3 state variables.
Figure 4: Comparison of various methods in high-dimensional linear MDPs.
Figure 5: RL performance for MDPs with nonlinear exo reward functions.
...and 12 more figures

Theorems & Definitions (39)

Definition 1: Causally-Exogenous Variables
Definition 2: Action-Disconnected State Variable
Theorem 1
Corollary 1
Theorem 2: Causally Exogenous DBN
proof
Definition 3: 2-Exogenous State MDP
Definition 4: Valid Exo/Endo Decomposition
Theorem 3: Union of Exo/Endo Decompositions
proof
...and 29 more

Reinforcement Learning with Exogenous States and Rewards

TL;DR

Abstract

Reinforcement Learning with Exogenous States and Rewards

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (39)