Table of Contents
Fetching ...

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan, Qu Yang, Mang Ye

TL;DR

This work addresses catastrophic forgetting and knowledge misalignment in multimodal LLMs when fine-tuned with LoRA by introducing LoRASculpt, a two-pronged framework that performs sparsity-driven redundancy reduction and knowledge-guided regularization. It provides theoretical guarantees on the sparsity of the LoRA product $BA$, with $\mathbb{E}[s_{BA}] = 1 - (1 - s_A s_B)^r$ and a concentration bound $\mathbb{P}(|s_C - \mathbb{E}[s_C]| \ge \delta) \le 2 \exp(-2 \delta^2 pq /( r(p+q)))$, and couples this with a pretrained-knowledge guided Conflict Mitigation Regularizer to steer updates away from critical general-knowledge regions via the loss $\mathcal{L} = \mathcal{L}_{Task} + \alpha \mathcal{L}_{CMR}$. The framework also adapts to the MLLM connector by employing soft sparsity and a joint loss $\mathcal{L} = \mathcal{L}_{Task} + \alpha L_{CMR}^{LLM} + \beta L_{CMR}^{Con}$ to harmonize knowledge across modules. Extensive experiments on VQA and image captioning demonstrate reduced forgetting and improved downstream performance across LoRA ranks, with ablations validating SRR and RKH and empirical results corroborating the proposed theorems. LoRASculpt thus offers a scalable, principled path to harmonize general and specialized knowledge in MLLMs, including their connectors.

Abstract

While Multimodal Large Language Models (MLLMs) excel at generalizing across modalities and tasks, effectively adapting them to specific downstream tasks while simultaneously retaining both general and specialized knowledge remains challenging. Although Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in MLLMs, it introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance. To address this issue, we propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge. Specifically, under theoretical guarantees, we introduce sparse updates into LoRA to discard redundant parameters effectively. Furthermore, we propose a Conflict Mitigation Regularizer to refine the update trajectory of LoRA, mitigating knowledge conflicts with the pretrained weights. Extensive experimental results demonstrate that even at very high degree of sparsity ($\le$ 5%), our method simultaneously enhances generalization and downstream task performance. This confirms that our approach effectively mitigates the catastrophic forgetting issue and further promotes knowledge harmonization in MLLMs.

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

TL;DR

This work addresses catastrophic forgetting and knowledge misalignment in multimodal LLMs when fine-tuned with LoRA by introducing LoRASculpt, a two-pronged framework that performs sparsity-driven redundancy reduction and knowledge-guided regularization. It provides theoretical guarantees on the sparsity of the LoRA product , with and a concentration bound , and couples this with a pretrained-knowledge guided Conflict Mitigation Regularizer to steer updates away from critical general-knowledge regions via the loss . The framework also adapts to the MLLM connector by employing soft sparsity and a joint loss to harmonize knowledge across modules. Extensive experiments on VQA and image captioning demonstrate reduced forgetting and improved downstream performance across LoRA ranks, with ablations validating SRR and RKH and empirical results corroborating the proposed theorems. LoRASculpt thus offers a scalable, principled path to harmonize general and specialized knowledge in MLLMs, including their connectors.

Abstract

While Multimodal Large Language Models (MLLMs) excel at generalizing across modalities and tasks, effectively adapting them to specific downstream tasks while simultaneously retaining both general and specialized knowledge remains challenging. Although Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in MLLMs, it introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance. To address this issue, we propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge. Specifically, under theoretical guarantees, we introduce sparse updates into LoRA to discard redundant parameters effectively. Furthermore, we propose a Conflict Mitigation Regularizer to refine the update trajectory of LoRA, mitigating knowledge conflicts with the pretrained weights. Extensive experimental results demonstrate that even at very high degree of sparsity ( 5%), our method simultaneously enhances generalization and downstream task performance. This confirms that our approach effectively mitigates the catastrophic forgetting issue and further promotes knowledge harmonization in MLLMs.

Paper Structure

This paper contains 23 sections, 3 theorems, 43 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $B \in \mathbb{R}^{p \times r}$ and $A \in \mathbb{R}^{r \times q}$ be two low rank matrices in LoRA, then the expected sparsity of the product matrix $BA \in \mathbb{R}^{p \times q}$ is given by: Proof. See Appendix A. $\Box$

Figures (5)

  • Figure 1: Motivation. Fine-tuning MLLM with LoRA on downstream tasks generates numerous harmful redundancy. (a) Illustration of post-pruning LoRA to reduce redundancy. (b) Simply pruning harmful redundancy in LoRA based on magnitude reduces forgetting of pretrained knowledge (Source), and even enhances downstream task performance (Target). (c) Current MLLMs suffer from catastrophic forgetting in both LLM and connector with fewer parameters, as detailed in \ref{['sec:cf_connector']}.
  • Figure 2: Framework Illustration. LoRASculpt consists of two components: Sparsifying and Regularizing. Sparsifying process aims to reduce redundancy by retaining only a sparse subset of parameters in the low-rank matrices. Regularizing process is guided by the pretrained-knowledge informed regularization, which adjusts the optimization trajectory to mitigate conflicts between the sparse LoRA subset and the pretrained knowledge, thereby promoting knowledge harmonization. The resulting sparse product matrix $BA$ can be merged with pretrained weights without severe knowledge conflicts.
  • Figure 3: Results Across Different Epochs compared with SOTA method and baseline, showing LoRA obvious decline in source and Tailor compromise in target performance, while our method maintains stable for both. Please see details in \ref{['sec:comparison_to_sota']}
  • Figure 4: Hyperparameter Study for function steepness $w$ (\ref{['eq:cal_m']}), balancing coefficient $\alpha$ (\ref{['eq:ours_reg']}), and sparsity ratio $s$ (\ref{['eq:sparsify']}) when fix LoRA rank=32 and fine-tuning on IconQA. Please refer to \ref{['sec:ablation_study']} for detailed discussion.
  • Figure 5: Actual Sparsity of LoRA in Each Layer with the sparsity $s_A=s_B=0.1$. Please refer to \ref{['sec:Empirical_Validation']} for details.

Theorems & Definitions (6)

  • Theorem 3.1
  • Theorem 3.2
  • proof
  • proof
  • proof
  • Theorem D.1