Table of Contents
Fetching ...

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

TL;DR

The paper tackles interpretability in multimodal LLMs by identifying domain-specific neurons using domain activation probability entropy (DAPE) across five visual domains. It introduces a three-stage mechanism for processing post-projection visual features inside the LLM and validates it with logit-lens analyses, showing that domain-specific information is not consistently leveraged in VQA tasks. The study reveals that domain-specific neurons are rare (<1%), yet their targeted perturbation can affect performance in domain-dependent ways, providing a pathway toward cross-domain, all-encompassing multimodal LLMs. These findings offer neuron-level insights that can inform more explicit domain specialization and robust cross-domain generalization.

Abstract

Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage mechanism for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. The source code is available at https://github.com/Z1zs/MMNeuron.

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

TL;DR

The paper tackles interpretability in multimodal LLMs by identifying domain-specific neurons using domain activation probability entropy (DAPE) across five visual domains. It introduces a three-stage mechanism for processing post-projection visual features inside the LLM and validates it with logit-lens analyses, showing that domain-specific information is not consistently leveraged in VQA tasks. The study reveals that domain-specific neurons are rare (<1%), yet their targeted perturbation can affect performance in domain-dependent ways, providing a pathway toward cross-domain, all-encompassing multimodal LLMs. These findings offer neuron-level insights that can inform more explicit domain specialization and robust cross-domain generalization.

Abstract

Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechanism of how MLLMs process features from diverse domains. Furthermore, we propose a three-stage mechanism for language model modules in MLLMs when handling projected image features, and verify this hypothesis using logit lens. Extensive experiments indicate that while current MLLMs exhibit Visual Question Answering (VQA) capability, they may not fully utilize domain-specific information. Manipulating domain-specific neurons properly will result in a 10% change of accuracy at most, shedding light on the development of cross-domain, all-encompassing MLLMs in the future. The source code is available at https://github.com/Z1zs/MMNeuron.
Paper Structure (32 sections, 10 equations, 17 figures, 9 tables)

This paper contains 32 sections, 10 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Neuron analysis in previous language-specific setting of large language model (a) and our domain-specific setting of multimodal large language model (b).
  • Figure 2: PCA visualization of image embeddings extracted through CLIP's image encoder.
  • Figure 3: The overall framework of our proposed MMNeuron method (taking LLaVA architecture as an example), which can be applied to any MLP layers with an activation layer in multimodal large language models.
  • Figure 4: General Framework of logit len analysis, where it takes the hidden state at an intermediate layer (e.g., $h1$ above), and convert the hidden state into logits with the unembedding layer. Note that Emb, Pos Emb, Res, and Unemb stand for Embedding, Position Embedding, Residual Layer, and Unembedding, respectively.
  • Figure 5: Distribution of domain-specific neurons in InstructBLIP.
  • ...and 12 more figures