Harmony: A Unified Framework for Modality Incremental Learning
Yaguang Song, Xiaoshan Yang, Dongmei Jiang, Yaowei Wang, Changsheng Xu
TL;DR
The paper tackles the challenge of Modality Incremental Learning (MIL), where a single model must continually learn across a sequence of distinct modalities using only unimodal data at each stage. It proposes Harmony, a transformer-based framework with Adaptive Compatible Feature Modulation to generate compatible historical features and Cumulative Modal Bridging to fuse historical knowledge with current learning via modality knowledge aggregation and a three-part Hybrid Alignment. The approach demonstrates superior performance on two MIL benchmarks, EPIC-MIL and Drive&Act-MIL, over a range of incremental-learning baselines, highlighting its ability to bridge modality gaps while preserving prior knowledge. By enabling effective modal connections and knowledge accumulation under data-restricted conditions, Harmony advances practical MIL and opens avenues for adding more modalities in open-world settings.
Abstract
Incremental learning aims to enable models to continuously acquire knowledge from evolving data streams while preserving previously learned capabilities. While current research predominantly focuses on unimodal incremental learning and multimodal incremental learning where the modalities are consistent, real-world scenarios often present data from entirely new modalities, posing additional challenges. This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences. To this end, we introduce a novel paradigm called Modality Incremental Learning (MIL), where each learning stage involves data from distinct modalities. To address this task, we propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention, enabling the model to reduce the modal discrepancy and learn from a sequence of distinct modalities, ultimately completing tasks across multiple modalities within a unified framework. Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging. Through constructing historical modal features and performing modal knowledge accumulation and alignment, the proposed components collaboratively bridge modal differences and maintain knowledge retention, even with solely unimodal data available at each learning phase.These components work in concert to establish effective modality connections and maintain knowledge retention, even when only unimodal data is available at each learning stage. Extensive experiments on the MIL task demonstrate that our proposed method significantly outperforms existing incremental learning methods, validating its effectiveness in MIL scenarios.
