Incrementally Learning Multiple Diverse Data Domains via Multi-Source Dynamic Expansion Model
Runqing Wu, Fei Ye, Qihe Liu, Guoxi Huang, Jinyu Guo, Rongyao Hu
TL;DR
This work tackles multi-domain continual learning by introducing MSDEM, a framework that leverages multiple pre-trained Vision Transformer backbones to form dynamic experts for new tasks. It introduces two key components: Dynamic Expandable Attention Mechanism (DEAM) that selectively gates knowledge from backbones for each task, and Dynamic Graph Weight Router (DGWR) that reuses prior experts through a learnable graph router to maximize transfer while mitigating forgetting. Through experiments on cross-domain datasets, MSDEM achieves state-of-the-art average performance with fewer parameters than competitive baselines, demonstrating strong generalization across domain shifts and class increments. The approach offers a practical pathway to scalable, efficient continual learning in heterogeneous data environments by reusing diverse, pre-trained knowledge sources and adapting only task-specific modules.
Abstract
Continual Learning seeks to develop a model capable of incrementally assimilating new information while retaining prior knowledge. However, current research predominantly addresses a straightforward learning context, wherein all data samples originate from a singular data domain. This paper shifts focus to a more complex and realistic learning environment, characterized by data samples sourced from multiple distinct domains. We tackle this intricate learning challenge by introducing a novel methodology, termed the Multi-Source Dynamic Expansion Model (MSDEM), which leverages various pre-trained models as backbones and progressively establishes new experts based on them to adapt to emerging tasks. Additionally, we propose an innovative dynamic expandable attention mechanism designed to selectively harness knowledge from multiple backbones, thereby accelerating the new task learning. Moreover, we introduce a dynamic graph weight router that strategically reuses all previously acquired parameters and representations for new task learning, maximizing the positive knowledge transfer effect, which further improves generalization performance. We conduct a comprehensive series of experiments, and the empirical findings indicate that our proposed approach achieves state-of-the-art performance.
