Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping
Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille
TL;DR
The paper addresses catastrophic forgetting and catastrophic remembering in strict continual learning by introducing Relevance Mapping Networks (RMNs) that learn per-task binary relevance masks to create an optimal overlap of network parameters without data replay. Grounded in the Optimal Overlap Hypothesis, RMNs gate weights to preserve prior knowledge while adapting to new tasks, achieving state-of-the-art results on standard benchmarks. They also demonstrate unsupervised capabilities for detecting new tasks and inferring task identity without labeled task information, extending the approach to catastrophic remembering. This work offers a practical, memory-free solution for continual learning that maintains discriminability across tasks and enables task discovery in dynamic environments.
Abstract
Catastrophic forgetting in neural networks is a significant problem for continual learning. A majority of the current methods replay previous data during training, which violates the constraints of an ideal continual learning system. Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i.e. the worsening ability to discriminate between data from different tasks. In our work, we introduce Relevance Mapping Networks (RMNs) which are inspired by the Optimal Overlap Hypothesis. The mappings reflects the relevance of the weights for the task at hand by assigning large weights to essential parameters. We show that RMNs learn an optimized representational overlap that overcomes the twin problem of catastrophic forgetting and remembering. Our approach achieves state-of-the-art performance across all common continual learning datasets, even significantly outperforming data replay methods while not violating the constraints for an ideal continual learning system. Moreover, RMNs retain the ability to detect data from new tasks in an unsupervised manner, thus proving their resilience against catastrophic remembering.
