Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models
Hyeontaek Hwang, Nguyen Dinh Son, Daeyoung Kim
TL;DR
This work tackles catastrophic forgetting during task-specific fine-tuning of Multimodal LLMs by introducing Model-Dowser, a data-free sparse finetuning framework. It defines a principled functional importance score $S^{(l)}_{ij} = \|J^{(l)}_i\|_2 \cdot |W^{(l)}_{ij}| \cdot |h^{(l-1)}_j|$ and estimates it without real data using synthetic probing and the Hutchinson trace estimator, yielding a robust, low-overhead mask to freeze high-sensitivity parameters. The method updates only the least important parameters, achieving strong memory efficiency ($\mathcal{O}(|P|)$) and superior forgetting mitigation across LLaVA and NVILA on diverse downstream tasks, even when fine-tuning deep decoder layers. The results demonstrate stable preservation of pretrained generalization while enabling task-specific adaptation, with consistent improvements over strong baselines and clear scalability to multi-billion-parameter models.
Abstract
Fine-tuning Multimodal Large Language Models (MLLMs) on task-specific data is an effective way to improve performance on downstream applications. However, such adaptation often leads to a degradation in generalization on pretrained tasks, a phenomenon known as Catastrophic Forgetting. Existing methods that aim to mitigate this issue either become ineffective when fine-tuning deeper layers of the language decoder or scale poorly with increasing model size. To address these limitations, we propose Model-Dowser, a novel sparse fine-tuning approach for MLLMs. Model-Dowser measures a principled importance score for each model parameter with respect to pretrained generalization (prior to downstream adaptation) by jointly considering weight magnitudes, input activations, and output sensitivities. During fine-tuning, Model-Dowser selectively preserves high-importance parameters and updates the remaining. Comprehensive experiments on two representative MLLMs, LLaVA and NVILA, demonstrate that Model-Dowser effectively mitigates catastrophic forgetting and consistently outperforms prior methods, while remaining resource-efficient and scalable to multi-billion-parameter models.
