LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang; Wenke Huang; Guancheng Wan; Qu Yang; Mang Ye

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan, Qu Yang, Mang Ye

TL;DR

This work addresses catastrophic forgetting and knowledge misalignment in multimodal LLMs when fine-tuned with LoRA by introducing LoRASculpt, a two-pronged framework that performs sparsity-driven redundancy reduction and knowledge-guided regularization. It provides theoretical guarantees on the sparsity of the LoRA product $BA$, with $\mathbb{E}[s_{BA}] = 1 - (1 - s_A s_B)^r$ and a concentration bound $\mathbb{P}(|s_C - \mathbb{E}[s_C]| \ge \delta) \le 2 \exp(-2 \delta^2 pq /( r(p+q)))$, and couples this with a pretrained-knowledge guided Conflict Mitigation Regularizer to steer updates away from critical general-knowledge regions via the loss $\mathcal{L} = \mathcal{L}_{Task} + \alpha \mathcal{L}_{CMR}$. The framework also adapts to the MLLM connector by employing soft sparsity and a joint loss $\mathcal{L} = \mathcal{L}_{Task} + \alpha L_{CMR}^{LLM} + \beta L_{CMR}^{Con}$ to harmonize knowledge across modules. Extensive experiments on VQA and image captioning demonstrate reduced forgetting and improved downstream performance across LoRA ranks, with ablations validating SRR and RKH and empirical results corroborating the proposed theorems. LoRASculpt thus offers a scalable, principled path to harmonize general and specialized knowledge in MLLMs, including their connectors.

Abstract

While Multimodal Large Language Models (MLLMs) excel at generalizing across modalities and tasks, effectively adapting them to specific downstream tasks while simultaneously retaining both general and specialized knowledge remains challenging. Although Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in MLLMs, it introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance. To address this issue, we propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge. Specifically, under theoretical guarantees, we introduce sparse updates into LoRA to discard redundant parameters effectively. Furthermore, we propose a Conflict Mitigation Regularizer to refine the update trajectory of LoRA, mitigating knowledge conflicts with the pretrained weights. Extensive experimental results demonstrate that even at very high degree of sparsity ($\le$ 5%), our method simultaneously enhances generalization and downstream task performance. This confirms that our approach effectively mitigates the catastrophic forgetting issue and further promotes knowledge harmonization in MLLMs.

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

TL;DR

Abstract

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)