AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs
Xuyang Wei, Chunlin Tian, Li Li
TL;DR
AsymLoRA tackles the challenge of instruction fine-tuning for Multimodal Large Language Models by addressing both modality-specific conflicts and cross-task commonalities. It introduces an asymmetric LoRA design with a shared low-rank matrix $A$ for universal knowledge and task-specific matrices $B_i$ for domain-specific adaptation, optionally extended with a Mixture-of-AsymLoRA-Experts to dynamically select adaptation paths via gating. The approach yields consistent improvements over vanilla LoRA and LoRA-MoE across diverse single-domain and multi-domain benchmarks, while reducing trainable parameters and increasing efficiency. This framework offers a scalable solution for robust multimodal instruction tuning and cross-domain transfer, with practical impact for building versatile MLLMs in diverse settings, albeit with some inference overhead and questions for theoretical grounding of the shared knowledge consolidation.
Abstract
Effective instruction fine-tuning on diverse image-text datasets is crucial for developing a versatile Multimodal Large Language Model (MLLM), where dataset composition dictates the model's adaptability across multimodal tasks. However, complex datasets often contain inherent conflicts -- stemming from modality-specific optimization objectives -- and latent commonalities that enable cross-task transfer, which most existing approaches handle separately. To bridge this gap, we introduce AsymLoRA, a parameter-efficient tuning framework that unifies knowledge modularization and cross-modal coordination via asymmetric LoRA: task-specific low-rank projections (matrix B) that preserve distinct adaptation pathways for conflicting objectives, and a shared projection (matrix A) that consolidates cross-modal commonalities. Extensive evaluations demonstrate that AsymLoRA consistently surpasses both vanilla LoRA, which captures only commonalities, and LoRA-MoE, which focuses solely on conflicts, achieving superior model performance and system efficiency across diverse benchmarks.\href{Code}{https://github.com/Clin0212/HydraLoRA/blob/main/MLLM-HydraLoRA/README.md}.
