Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

Dane Malenfant

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

Dane Malenfant

TL;DR

The view that a continual RL problem arises from instability of the agent--world boundary (rather than exogenous task switches) in decentralized MARL suggests future work on preserving, predicting, or otherwise managing boundary drift.

Abstract

Reusable decision structure survives across episodes in reinforcement learning, but this depends on how the agent--world boundary is drawn. In stationary, finite-horizon MDPs, an invariant core: the (not-necessarily contiguous) subsequences of state--action pairs shared by all successful trajectories (optionally under a simple abstraction) can be constructed. Under mild goal-conditioned assumptions, it's existence can be proven and explained by how the core captures prototypes that transfer across episodes. When the same task is embedded in a decentralized Markov game and the peer agent is folded into the world, each peer-policy update induces a new MDP; the per-episode invariant core can shrink or vanish, even with small changes to the induced world dynamics, sometimes leaving only the individual task core or just nothing. This policy-induced non-stationarity can be quantified with a variation budget over the induced kernels and rewards, linking boundary drift to loss of invariants. The view that a continual RL problem arises from instability of the agent--world boundary (rather than exogenous task switches) in decentralized MARL suggests future work on preserving, predicting, or otherwise managing boundary drift.

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

TL;DR

Abstract

Paper Structure (9 sections, 2 theorems, 7 equations)

This paper contains 9 sections, 2 theorems, 7 equations.

Introduction
Contributions.
The agent--world boundary drifts as policies update over time
The boundary is stable in single-agent tasks
Trajectory trie representation
Invariant core
The agent--world boundary shifts with another agent
A variation budget from shifting MDPs can measure this change
Conclusion

Key Result

Theorem 2.1

If $G=\{g\}$ is a unique absorbing goal and episodes terminate on first visit to $g$, then $\mathrm{Core}(\mathcal{S})\neq\emptyset$. More generally, if there exists an abstraction $\phi$ such that every $\tau\in\mathcal{S}$ contains a common abstract symbol (e.g., an option such as open_door), then

Theorems & Definitions (3)

Theorem 2.1: Existence
proof : Sketch
Proposition 2.1: Episode-to-episode core drift

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

TL;DR

Abstract

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (3)