Table of Contents
Fetching ...

MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

Jiameng Li, Aleksei Tiulpin, Matthew B. Blaschko

Abstract

For multimodal large language models (MLLMs), visual information is relatively sparse compared with text. As a result, research on visual pruning emerges for efficient inference. Current approaches typically measure token importance based on the attention scores in the visual encoder or in the LLM decoder, then select visual tokens with high attention scores while pruning others. In this paper, we pursue a different and more surgical approach. Instead of relying on mechanism-specific signals, we directly compute Mutual Information (MI) between visual and textual features themselves, prior to their interaction. This allows us to explicitly measure crossmodal dependency at the feature levels. Our MI-Pruner is simple, efficient and non-intrusive, requiring no access to internal attention maps or architectural modifications. Experimental results demonstrate that our approach outperforms previous attention-based pruning methods with minimal latency.

MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

Abstract

For multimodal large language models (MLLMs), visual information is relatively sparse compared with text. As a result, research on visual pruning emerges for efficient inference. Current approaches typically measure token importance based on the attention scores in the visual encoder or in the LLM decoder, then select visual tokens with high attention scores while pruning others. In this paper, we pursue a different and more surgical approach. Instead of relying on mechanism-specific signals, we directly compute Mutual Information (MI) between visual and textual features themselves, prior to their interaction. This allows us to explicitly measure crossmodal dependency at the feature levels. Our MI-Pruner is simple, efficient and non-intrusive, requiring no access to internal attention maps or architectural modifications. Experimental results demonstrate that our approach outperforms previous attention-based pruning methods with minimal latency.

Paper Structure

This paper contains 53 sections, 1 theorem, 39 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1.5

Let $A$ be an event and let $\{B_i\}_{i=1}^N$ be a set of mutually exclusive and exhaustive events. The Law of Total Probability indicates: $\blacktriangleleft$$\blacktriangleleft$

Figures (7)

  • Figure 1: Pruning visualization on LLaVA1.5-7B with different budgets. Our MI-Pruner consistently identifies and preserves the queried regions, whereas other methods partially miss relevant information. Tokens from top to bottom: 64, 128, 256.
  • Figure 2: Overview. Previous methods prune tokens by attention scores from vision encoder or LLM decoder. Our MI-Pruner calculates Mutual Information between visual and textual embeddings in the projection space, achieving optimal performance with minimal latency.
  • Figure 3: A toy model of MI-based pruning. We construct similarity matrices to get conditional probability and marginal probability, then calculate crossmodal PMI (top) and internal PMI (bottom). All [vis] tokens are flattened for illustration.
  • Figure 4: Performance on Qwen2VL series (GQA).
  • Figure 5: Pruning visualization on Qwen3VL series. Our method retains the semantic-relative patches regarding prompts adaptively.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Definition 2.1: Mutual Information kraskov2004estimatinggray2011entropy
  • Definition 2.2: Marginal Gain fujishige2005submodular
  • Definition 2.3: Submodularity and Diminishing Returns fujishige2005submodular
  • Definition 1.1: Shannon Entropy kraskov2004estimatinggray2011entropygallager1968information
  • Definition 1.2: Conditional Entropy kraskov2004estimatinggray2011entropygallager1968information
  • Definition 1.3: Conditional Mutual Information gallager1968information
  • Definition 1.4: Submodular Function fujishige2005submodular
  • Proposition 1.5: Law of Total Probability jaynes1957information