Drift to Remember
Jin Du, Xinhe Zhang, Hao Shen, Xun Xian, Ganghua Wang, Jiawei Zhang, Yuhong Yang, Na Li, Jia Liu, Jie Ding
TL;DR
The paper tackles catastrophic forgetting in lifelong learning by leveraging representational drift, inspired by biological neural dynamics. It introduces DriftNet, a drift-driven framework that continuously explores local minima via external noise, encodes them into task-specific groups in a knowledge base, and retrieves relevant knowledge using uncertainty-based selection. Across simulated tasks, CIFAR-10/100, and NLP with GPT-2 integration, DriftNet outperforms the Stable baseline and approaches joint/theoretical-limits performance, while being scalable to billions-parameter LLMs on a single Nvidia A100 GPU and using only new data. The approach offers a general, scalable mechanism for continual learning with potential insights into biological learning and broad applicability to multi-domain, real-time AI systems.
Abstract
Lifelong learning in artificial intelligence (AI) aims to mimic the biological brain's ability to continuously learn and retain knowledge, yet it faces challenges such as catastrophic forgetting. Recent neuroscience research suggests that neural activity in biological systems undergoes representational drift, where neural responses evolve over time, even with consistent inputs and tasks. We hypothesize that representational drift can alleviate catastrophic forgetting in AI during new task acquisition. To test this, we introduce DriftNet, a network designed to constantly explore various local minima in the loss landscape while dynamically retrieving relevant tasks. This approach ensures efficient integration of new information and preserves existing knowledge. Experimental studies in image classification and natural language processing demonstrate that DriftNet outperforms existing models in lifelong learning. Importantly, DriftNet is scalable in handling a sequence of tasks such as sentiment analysis and question answering using large language models (LLMs) with billions of parameters on a single Nvidia A100 GPU. DriftNet efficiently updates LLMs using only new data, avoiding the need for full dataset retraining. Tested on GPT-2 and RoBERTa, DriftNet is a robust, cost-effective solution for lifelong learning in LLMs. This study not only advances AI systems to emulate biological learning, but also provides insights into the adaptive mechanisms of biological neural systems, deepening our understanding of lifelong learning in nature.
