Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs
Zeping Yu, Sophia Ananiadou
TL;DR
The paper tackles catastrophic forgetting in multimodal LLMs that arise after vision-language instruction tuning. It introduces Locate-then-Merge, a training-free parameter fusion framework, and a neuron-level instantiation called Neuron-Fusion that preserves neurons with large parameter changes while suppressing small-change neurons to retain language ability while keeping visual adaptation. Across 13 benchmarks and two open-source MLLMs, Neuron-Fusion consistently outperforms existing model merging methods, and generation analysis shows reduced Not-Known and Context-Hallination, improving reliability. This approach provides a practical, data-free means to maintain language proficiency while enabling robust visual capabilities in multimodal models, with potential applicability to broader modalities and architectures in future work.
Abstract
Although multimodal large language models (MLLMs) have achieved impressive performance, the multimodal instruction tuning stage often causes catastrophic forgetting of the base LLM's language ability, even in strong models like Llama3. To address this, we propose Locate-then-Merge, a training-free parameter fusion framework that first locates important parameters and then selectively merges them. We further introduce Neuron-Fusion, a neuron-level strategy that preserves the influence of neurons with large parameter shifts--neurons likely responsible for newly acquired visual capabilities--while attenuating the influence of neurons with smaller changes that likely encode general-purpose language skills. This design enables better retention of visual adaptation while mitigating language degradation. Experiments on 13 benchmarks across both language and visual tasks show that Neuron-Fusion consistently outperforms existing model merging methods. Further analysis reveals that our method effectively reduces context hallucination in generation.
