On Value Functions and the Agent-Environment Boundary
Nan Jiang
TL;DR
The paper investigates how the agent-environment boundary affects the definability of value functions and the guarantees of RL algorithms with function approximation. It proposes a boundary-invariant analysis framework, demonstrated through a boundary-invariant rendition of Fitted Q-Iteration and supported by boundary-invariant treatment of contextual bandits. The work shows that, under boundary-invariant assumptions, near-optimal guarantees hold regardless of boundary choice and discusses implications for state resetting, MCTS, imitation learning, and verifiability. It encourages rethinking states and value functions in RL and highlights practical considerations when boundaries are ambiguous or unavailable.
Abstract
When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.
