MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
Haiyang Guo, Fei Zhu, Hongbo Zhao, Fanhu Zeng, Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang
TL;DR
MCITlib tackles multimodal continual learning by providing a standardized, extensible library for instruction-tuning multimodal LLMs with 8 representative MCIT algorithms evaluated on 3 benchmarks using 2 backbones. It combines PEFT-based training in a rehearsal-free setting with rigorous evaluation across continual-learning metrics ($MFT$, $MFN$, $MAA$, $BWT$) and general multimodal benchmarks, highlighting a persistent trade-off between mitigating forgetting and maintaining broad capabilities. The work delivers unified implementations, reproducible protocols, and comprehensive results to accelerate MCL research, while emphasizing information-leakage avoidance and one-click experiment workflows. Overall, MCITlib serves as a practical platform to benchmark, reproduce, and extend multimodal continual instruction-tuning methods, fostering rapid progress in the field.
Abstract
Continual learning enables AI systems to acquire new knowledge while retaining previously learned information. While traditional unimodal methods have made progress, the rise of Multimodal Large Language Models (MLLMs) brings new challenges in Multimodal Continual Learning (MCL), where models are expected to address both catastrophic forgetting and cross-modal coordination. To advance research in this area, we present MCITlib, a comprehensive library for Multimodal Continual Instruction Tuning. MCITlib currently implements 8 representative algorithms and conducts evaluations on 3 benchmarks under 2 backbone models. The library will be continuously updated to support future developments in MCL. The codebase is released at https://github.com/Ghy0501/MCITlib.
