Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu
TL;DR
Deep RL with function approximation can exhibit ray interference, where negative interference between multiple objective components and coupling to future data generation create long learning plateaus. The authors analyze a minimal 2x2 bandit to derive exact continuous-time dynamics, identify saddle points, plateaus, and basins of attraction, and then generalize to factored objectives and RL contexts. They show that plateaus arise near saddles when interference is negative and learning is coupled to performance, and that removing either factor (interference or coupling) eliminates plateaus; plateaus intensify as more components are added. The work highlights a potential explanation for slow convergence in deep RL and suggests remedies such as decoupling representations, using off-policy data, or modular architectures, with implications for multi-task and continual learning.
Abstract
Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problematic type of 'ray interference', characterized by learning dynamics that sequentially traverse a number of performance plateaus, effectively constraining the agent to learn one thing at a time even when learning in parallel is better. We establish the conditions under which ray interference occurs, show its relation to saddle points and obtain the exact learning dynamics in a restricted setting. We characterize a number of its properties and discuss possible remedies.
