MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu
TL;DR
The paper tackles interpretability in multimodal LLMs by identifying domain-specific neurons using domain activation probability entropy (DAPE) across five visual domains. It introduces a three-stage mechanism for processing post-projection visual features inside the LLM and validates it with logit-lens analyses, showing that domain-specific information is not consistently leveraged in VQA tasks. The study reveals that domain-specific neurons are rare (<1%), yet their targeted perturbation can affect performance in domain-dependent ways, providing a pathway toward cross-domain, all-encompassing multimodal LLMs. These findings offer neuron-level insights that can inform more explicit domain specialization and robust cross-domain generalization.
Abstract
Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage mechanism for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. The source code is available at https://github.com/Z1zs/MMNeuron.
