When Continue Learning Meets Multimodal Large Language Model: A Survey
Yukang Huo, Hao Tang
TL;DR
This survey tackles the challenge of continual learning in multimodal large language models by synthesizing insights from roughly 440 papers. It organizes the literature into foundational MLLM concepts, continual learning in unimodal, multimodal non-LLMs, and LLM contexts, and then analyzes state-of-the-art model innovations, learning methods, and benchmarks. Key contributions include a taxonomy of frameworks and methods across three dimensions (frameworks, objectives, and modules), a comprehensive review of benchmark suites (e.g., ROPE, CVQA, II-Bench, ConBench, COMPBENCH, Hallu-PI, ReForm-Eval, VisionGraph), and identified gaps in evaluation standardization and interpretability. The paper also highlights practical applications and provides forward-looking guidance on improving forgetting mitigation, benchmark standardization, and transparency, aiming to accelerate robust, scalable deployment of continual learning in multimodal intelligent systems.
Abstract
Recent advancements in Artificial Intelligence have led to the development of Multimodal Large Language Models (MLLMs). However, adapting these pre-trained models to dynamic data distributions and various tasks efficiently remains a challenge. Fine-tuning MLLMs for specific tasks often causes performance degradation in the model's prior knowledge domain, a problem known as 'Catastrophic Forgetting'. While this issue has been well-studied in the Continual Learning (CL) community, it presents new challenges for MLLMs. This review paper, the first of its kind in MLLM continual learning, presents an overview and analysis of 440 research papers in this area.The review is structured into four sections. First, it discusses the latest research on MLLMs, covering model innovations, benchmarks, and applications in various fields. Second, it categorizes and overviews the latest studies on continual learning, divided into three parts: non-large language models unimodal continual learning (Non-LLM Unimodal CL), non-large language models multimodal continual learning (Non-LLM Multimodal CL), and continual learning in large language models (CL in LLM). The third section provides a detailed analysis of the current state of MLLM continual learning research, including benchmark evaluations, architectural innovations, and a summary of theoretical and empirical studies.Finally, the paper discusses the challenges and future directions of continual learning in MLLMs, aiming to inspire future research and development in the field. This review connects the foundational concepts, theoretical insights, method innovations, and practical applications of continual learning for multimodal large models, providing a comprehensive understanding of the research progress and challenges in this field, aiming to inspire researchers in the field and promote the advancement of related technologies.
