Momentum Boosted Episodic Memory for Improving Learning in Long-Tailed RL Environments
Dolton Fernandes, Pramod Kaushik, Harsh Shukla, Bapi Raju Surampudi
TL;DR
The paper targets learning under Zipfian, long-tail data distributions in reinforcement learning by integrating a fast/slow learning paradigm. It introduces a modular Momentum Boosted Episodic Memory (MEM) architecture that uses a familiarity buffer and momentum-based contrastive learning to identify and prioritize rare trajectories, then reinstates their hidden activations through an episodic memory module to improve policy decisions. The approach, which augments IMPALA with a contrastive learning branch and a memory retrieval mechanism, delivers superior performance across Zipfian tasks and Atari benchmarks, outperforming strong baselines and several ablations. This method offers a practical, architecture-agnostic pathway to enhance sample efficiency and long-term credit assignment in non-uniform environments, with potential applicability to more realistic 3D settings and real-world decision-making problems.
Abstract
Traditional Reinforcement Learning (RL) algorithms assume the distribution of the data to be uniform or mostly uniform. However, this is not the case with most real-world applications like autonomous driving or in nature where animals roam. Some experiences are encountered frequently, and most of the remaining experiences occur rarely; the resulting distribution is called Zipfian. Taking inspiration from the theory of complementary learning systems, an architecture for learning from Zipfian distributions is proposed where important long tail trajectories are discovered in an unsupervised manner. The proposal comprises an episodic memory buffer containing a prioritised memory module to ensure important rare trajectories are kept longer to address the Zipfian problem, which needs credit assignment to happen in a sample efficient manner. The experiences are then reinstated from episodic memory and given weighted importance forming the trajectory to be executed. Notably, the proposed architecture is modular, can be incorporated in any RL architecture and yields improved performance in multiple Zipfian tasks over traditional architectures. Our method outperforms IMPALA by a significant margin on all three tasks and all three evaluation metrics (Zipfian, Uniform, and Rare Accuracy) and also gives improvements on most Atari environments that are considered challenging
