Episodic Memory in Lifelong Language Learning
Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama
TL;DR
The paper tackles lifelong language learning without dataset identifiers, proposing an episodic-memory augmented encoder–decoder that uses sparse experience replay and local adaptation to mitigate catastrophic forgetting. A frozen key-network builds a key-value memory storing past <x_t, y_t> pairs, enabling selective replay and retrieval-based adaptation, with a simple random write strategy to manage memory. Experiments on text classification and QA show that combining sparse replay and memory-guided local adaptation (MbPA++) yields strong performance, approaching a multitask upper bound and demonstrating positive transfer across datasets. The approach emphasizes scalable memory management and retrieval quality as crucial for building a general linguistic intelligence capable of learning across diverse data distributions in a single pass.
Abstract
We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay and local adaptation to allow the model to continuously learn from new datasets. We also show that the space complexity of the episodic memory module can be reduced significantly (~50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. We consider an episodic memory component as a crucial building block of general linguistic intelligence and see our model as a first step in that direction.
