Achieving Upper Bound Accuracy of Joint Training in Continual Learning
Saleh Momeni, Bing Liu
TL;DR
The paper addresses the gap between continual learning (CIL) performance and joint-training upper bound accuracy, focusing on the CF and ICS challenges. It highlights prototype-based approaches using frozen foundation-model features, notably kernelized LDA (KLDA), which with Random Fourier Features can match or exceed joint training without updating representations. Empirical results across text and image tasks show KLDA variants can reach or approach the joint upper bound, suggesting that high-quality foundation-model representations enable practical CIL. The work argues that learning new representations may be unnecessary in many domains, discusses neuroscientific perspectives on stable representations, and underscores the practical impact of enabling real-world continual learning deployments with frozen features.
Abstract
Continual learning has been an active research area in machine learning, focusing on incrementally learning a sequence of tasks. A key challenge is catastrophic forgetting (CF), and most research efforts have been directed toward mitigating this issue. However, a significant gap remains between the accuracy achieved by state-of-the-art continual learning algorithms and the ideal or upper-bound accuracy achieved by training all tasks together jointly. This gap has hindered or even prevented the adoption of continual learning in applications, as accuracy is often of paramount importance. Recently, another challenge, termed inter-task class separation (ICS), was also identified, which spurred a theoretical study into principled approaches for solving continual learning. Further research has shown that by leveraging the theory and the power of large foundation models, it is now possible to achieve upper-bound accuracy, which has been empirically validated using both text and image classification datasets. Continual learning is now ready for real-life applications. This paper surveys the main research leading to this achievement, justifies the approach both intuitively and from neuroscience research, and discusses insights gained.
