Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning
Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz
TL;DR
This work tackles catastrophic forgetting in continual learning by introducing AGILE, a rehearsal-based approach that uses a shared task-attention module augmented with lightweight per-task projection vectors to reduce inter-task interference. By expanding task projections as new tasks arrive and employing a pairwise discrepancy loss alongside EMA-based consistency, AGILE achieves strong WP and TP, scales to many tasks with minimal overhead, and exhibits improved calibration and reduced recency bias. Extensive experiments on Seq-CIFAR10/100 and Seq-TinyImageNet show AGILE outperforming rehearsal-based baselines across Class-IL and Task-IL settings, with robust performance in low-buffer regimes. The findings demonstrate the effectiveness of task-attention mechanisms in continual learning and point to future work in extending AGILE to transformer architectures and further reducing forgetting in shared components.
Abstract
Continual learning (CL) remains a significant challenge for deep neural networks, as it is prone to forgetting previously acquired knowledge. Several approaches have been proposed in the literature, such as experience rehearsal, regularization, and parameter isolation, to address this problem. Although almost zero forgetting can be achieved in task-incremental learning, class-incremental learning remains highly challenging due to the problem of inter-task class separation. Limited access to previous task data makes it difficult to discriminate between classes of current and previous tasks. To address this issue, we propose `Attention-Guided Incremental Learning' (AGILE), a novel rehearsal-based CL approach that incorporates compact task attention to effectively reduce interference between tasks. AGILE utilizes lightweight, learnable task projection vectors to transform the latent representations of a shared task attention module toward task distribution. Through extensive empirical evaluation, we show that AGILE significantly improves generalization performance by mitigating task interference and outperforming rehearsal-based approaches in several CL scenarios. Furthermore, AGILE can scale well to a large number of tasks with minimal overhead while remaining well-calibrated with reduced task-recency bias.
