Table of Contents
Fetching ...

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

Haiyang Guo, Fei Zhu, Hongbo Zhao, Fanhu Zeng, Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang

TL;DR

MCITlib tackles multimodal continual learning by providing a standardized, extensible library for instruction-tuning multimodal LLMs with 8 representative MCIT algorithms evaluated on 3 benchmarks using 2 backbones. It combines PEFT-based training in a rehearsal-free setting with rigorous evaluation across continual-learning metrics ($MFT$, $MFN$, $MAA$, $BWT$) and general multimodal benchmarks, highlighting a persistent trade-off between mitigating forgetting and maintaining broad capabilities. The work delivers unified implementations, reproducible protocols, and comprehensive results to accelerate MCL research, while emphasizing information-leakage avoidance and one-click experiment workflows. Overall, MCITlib serves as a practical platform to benchmark, reproduce, and extend multimodal continual instruction-tuning methods, fostering rapid progress in the field.

Abstract

Continual learning enables AI systems to acquire new knowledge while retaining previously learned information. While traditional unimodal methods have made progress, the rise of Multimodal Large Language Models (MLLMs) brings new challenges in Multimodal Continual Learning (MCL), where models are expected to address both catastrophic forgetting and cross-modal coordination. To advance research in this area, we present MCITlib, a comprehensive library for Multimodal Continual Instruction Tuning. MCITlib currently implements 8 representative algorithms and conducts evaluations on 3 benchmarks under 2 backbone models. The library will be continuously updated to support future developments in MCL. The codebase is released at https://github.com/Ghy0501/MCITlib.

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

TL;DR

MCITlib tackles multimodal continual learning by providing a standardized, extensible library for instruction-tuning multimodal LLMs with 8 representative MCIT algorithms evaluated on 3 benchmarks using 2 backbones. It combines PEFT-based training in a rehearsal-free setting with rigorous evaluation across continual-learning metrics (, , , ) and general multimodal benchmarks, highlighting a persistent trade-off between mitigating forgetting and maintaining broad capabilities. The work delivers unified implementations, reproducible protocols, and comprehensive results to accelerate MCL research, while emphasizing information-leakage avoidance and one-click experiment workflows. Overall, MCITlib serves as a practical platform to benchmark, reproduce, and extend multimodal continual instruction-tuning methods, fostering rapid progress in the field.

Abstract

Continual learning enables AI systems to acquire new knowledge while retaining previously learned information. While traditional unimodal methods have made progress, the rise of Multimodal Large Language Models (MLLMs) brings new challenges in Multimodal Continual Learning (MCL), where models are expected to address both catastrophic forgetting and cross-modal coordination. To advance research in this area, we present MCITlib, a comprehensive library for Multimodal Continual Instruction Tuning. MCITlib currently implements 8 representative algorithms and conducts evaluations on 3 benchmarks under 2 backbone models. The library will be continuously updated to support future developments in MCL. The codebase is released at https://github.com/Ghy0501/MCITlib.

Paper Structure

This paper contains 15 sections, 6 figures, 11 tables.

Figures (6)

  • Figure 1: MCITlib main functionalities and modules.
  • Figure 2: Performance curve of different methods under different settings.
  • Figure 3: A conceptual illustration of the continual learning evaluation metrics.
  • Figure 4: UCIT Benchmark Sample Visualization.
  • Figure 5: MLLM-DCL Benchmark Sample Visualization.
  • ...and 1 more figures