Table of Contents
Fetching ...

MLLM-CL: Continual Learning for Multimodal Large Language Models

Hongbo Zhao, Fei Zhu, Haiyang Guo, Meng Wang, Rundong Wang, Gaofeng Meng, Zhaoxiang Zhang

TL;DR

This work tackles the problem of adapting multimodal large language models (MLLMs) to dynamic real-world environments where domain knowledge and core abilities must be continually integrated. It introduces the MLLM-CL benchmark, which comprises Domain Continual Learning (DCL) with IID train/test across domains and Ability Continual Learning (ACL) with non-IID tests across abilities, formalized over task distributions $\u2113_t$ and $ \u2118_t$ with datasets ${D_t}$. To address continual learning in MLLMs, the paper proposes MR-LoRA, a method that partitions knowledge into domain/ability-specific Low-Rank Adaptation adapters and uses a large multimodal model–based router to select the appropriate expert, enabling two-stage inference (routing then prediction) and minimizing catastrophic forgetting via parameter isolation. Empirical results on LLaVA-v1.5-7b and InternVL demonstrate that MR-LoRA achieves state-of-the-art performance in both DCL and ACL, approaching Oracle upper bounds and showing robust routing even with limited task-specific data; the work also provides comprehensive analysis of router accuracy, LoRA rank effects, and knowledge transfer. The authors release code, datasets, and benchmarks to facilitate reproducibility and broader impact in lifelong learning for multimodal systems.

Abstract

Recent Multimodal Large Language Models (MLLMs) excel in vision-language understanding but face challenges in adapting to dynamic real-world scenarios that require continuous integration of new knowledge and skills. While continual learning (CL) offers a potential solution, existing benchmarks and methods suffer from critical limitations. In this paper, we introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning, where the former focuses on independently and identically distributed (IID) evaluation across evolving mainstream domains, whereas the latter evaluates on non-IID scenarios with new model abilities. Methodologically, we propose preventing catastrophic interference through parameter isolation and an MLLM-based routing mechanism. Extensive experiments demonstrate that our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods. Our benchmark and code are available at https://github.com/bjzhb666/MLLM-CL.

MLLM-CL: Continual Learning for Multimodal Large Language Models

TL;DR

This work tackles the problem of adapting multimodal large language models (MLLMs) to dynamic real-world environments where domain knowledge and core abilities must be continually integrated. It introduces the MLLM-CL benchmark, which comprises Domain Continual Learning (DCL) with IID train/test across domains and Ability Continual Learning (ACL) with non-IID tests across abilities, formalized over task distributions and with datasets . To address continual learning in MLLMs, the paper proposes MR-LoRA, a method that partitions knowledge into domain/ability-specific Low-Rank Adaptation adapters and uses a large multimodal model–based router to select the appropriate expert, enabling two-stage inference (routing then prediction) and minimizing catastrophic forgetting via parameter isolation. Empirical results on LLaVA-v1.5-7b and InternVL demonstrate that MR-LoRA achieves state-of-the-art performance in both DCL and ACL, approaching Oracle upper bounds and showing robust routing even with limited task-specific data; the work also provides comprehensive analysis of router accuracy, LoRA rank effects, and knowledge transfer. The authors release code, datasets, and benchmarks to facilitate reproducibility and broader impact in lifelong learning for multimodal systems.

Abstract

Recent Multimodal Large Language Models (MLLMs) excel in vision-language understanding but face challenges in adapting to dynamic real-world scenarios that require continuous integration of new knowledge and skills. While continual learning (CL) offers a potential solution, existing benchmarks and methods suffer from critical limitations. In this paper, we introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning, where the former focuses on independently and identically distributed (IID) evaluation across evolving mainstream domains, whereas the latter evaluates on non-IID scenarios with new model abilities. Methodologically, we propose preventing catastrophic interference through parameter isolation and an MLLM-based routing mechanism. Extensive experiments demonstrate that our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods. Our benchmark and code are available at https://github.com/bjzhb666/MLLM-CL.

Paper Structure

This paper contains 30 sections, 1 equation, 23 figures, 18 tables.

Figures (23)

  • Figure 1: Demonstrations of MLLM-CL benchmark. It incorporates Domain Continual Learning (DCL), which adds domain-specific knowledge, and Ability Continual Learning (ACL), which improves fundamental abilities for multimodal large language models.
  • Figure 2: The questioner-inspector data pipeline for generating StockQA instruction tuning dataset.
  • Figure 3: Prompt of the MLLM-based router selector.
  • Figure 4: Comparison of new task performance (LLaVA-based) on both domain and ability CL.
  • Figure 5: Overall framework of our MR-LoRA.
  • ...and 18 more figures