Table of Contents
Fetching ...

Reinforcement Learning with Exogenous States and Rewards

George Trimponias, Thomas G. Dietterich

TL;DR

This work addresses the slow learning caused by exogenous variability in reinforcement learning by introducing a formal exogenous–endogenous decomposition of MDPs under additive reward structure. It shows that the endogenous MDP, coupled with the exogenous reward regression, yields a decomposed Bellman structure where optimizing the endogenous MDP suffices for optimal policy in the original MDP. The authors formalize exogeno-decomposition, prove the maximal exogenous subspace exists and is unique, and develop two practical algorithms, GRDS and SRAS, to discover the exogenous subspace when mixing is linear; they also propose the CCC surrogate for conditional mutual information. Empirical results across high-dimensional linear, nonlinear, discrete, and combinatorial-action MDPs demonstrate substantial speedups and, in many cases, end-to-end performance matching or exceeding an endogenous-reward oracle, with Simplified-GRDS often the most robust option. The work advances RL in environments with uncontrollable exogenous factors by providing a principled, online method to isolate and remove exogenous reward noise, accelerating learning in complex control tasks.

Abstract

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous subspace, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.

Reinforcement Learning with Exogenous States and Rewards

TL;DR

This work addresses the slow learning caused by exogenous variability in reinforcement learning by introducing a formal exogenous–endogenous decomposition of MDPs under additive reward structure. It shows that the endogenous MDP, coupled with the exogenous reward regression, yields a decomposed Bellman structure where optimizing the endogenous MDP suffices for optimal policy in the original MDP. The authors formalize exogeno-decomposition, prove the maximal exogenous subspace exists and is unique, and develop two practical algorithms, GRDS and SRAS, to discover the exogenous subspace when mixing is linear; they also propose the CCC surrogate for conditional mutual information. Empirical results across high-dimensional linear, nonlinear, discrete, and combinatorial-action MDPs demonstrate substantial speedups and, in many cases, end-to-end performance matching or exceeding an endogenous-reward oracle, with Simplified-GRDS often the most robust option. The work advances RL in environments with uncontrollable exogenous factors by providing a principled, online method to isolate and remove exogenous reward noise, accelerating learning in complex control tasks.

Abstract

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP, but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. We study settings where the decomposition of the state space into exogenous and endogenous state spaces is not given but must be discovered. The paper introduces and proves correctness of algorithms for discovering the exogenous and endogenous subspaces of the state space when they are mixed through linear combination. These algorithms can be applied during reinforcement learning to discover the exogenous subspace, remove the exogenous reward, and focus reinforcement learning on the endogenous MDP. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning.
Paper Structure (39 sections, 16 theorems, 49 equations, 17 figures, 7 tables, 4 algorithms)

This paper contains 39 sections, 16 theorems, 49 equations, 17 figures, 7 tables, 4 algorithms.

Key Result

Theorem 1

A state variable $S$ is causally exogenous if and only if it is action-disconnected.

Figures (17)

  • Figure 1: Restricted 2-Time Step Dynamic Bayesian Network sufficient to establish that $X$ is exogenous.
  • Figure 2: Unrolled state transition diagram for the full exo DBN.
  • Figure 3: State transition diagram of MDP with 3 state variables.
  • Figure 4: Comparison of various methods in high-dimensional linear MDPs.
  • Figure 5: RL performance for MDPs with nonlinear exo reward functions.
  • ...and 12 more figures

Theorems & Definitions (39)

  • Definition 1: Causally-Exogenous Variables
  • Definition 2: Action-Disconnected State Variable
  • Theorem 1
  • Corollary 1
  • Theorem 2: Causally Exogenous DBN
  • proof
  • Definition 3: 2-Exogenous State MDP
  • Definition 4: Valid Exo/Endo Decomposition
  • Theorem 3: Union of Exo/Endo Decompositions
  • proof
  • ...and 29 more