Table of Contents
Fetching ...

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation

Ruoyu Chen, Xiaoqing Guo, Kangwei Liu, Siyuan Liang, Shiming Liu, Qunli Zhang, Laiyuan Wang, Hua Zhang, Xiaochun Cao

TL;DR

This work introduces EAGLE, a black-box attribution framework for autoregressive token generation in multimodal large language models (MLLMs). EAGLE localizes generated tokens to compact perceptual regions and quantifies the influence of language priors versus perceptual evidence using a unified objective that combines an insight score with a necessity score, optimized via greedy search over a sparsified image partition, yielding low memory usage. The approach offers a token-agnostic attribution mechanism, formalizes a weak submodularity-based guarantee for its greedy optimization, and provides a modality-aware analysis that yields fine-grained interpretability of decisions and hallucination causes. Extensive experiments across MS COCO, MMVP, and RePOPE on LLaVA-1.5, Qwen2.5-VL, and InternVL3.5 demonstrate state-of-the-art faithfulness, localization, and hallucination diagnosis, with substantial efficiency gains and practical utility for debugging and safety in MLLMs.

Abstract

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual regions while quantifying the relative influence of language priors and perceptual evidence. The framework introduces an objective function that unifies sufficiency (insight score) and indispensability (necessity score), optimized via greedy search over sparsified image regions for faithful and efficient attribution. Beyond spatial attribution, EAGLE performs modality-aware analysis that disentangles what tokens rely on, providing fine-grained interpretability of model decisions. Extensive experiments across open-source MLLMs show that EAGLE consistently outperforms existing methods in faithfulness, localization, and hallucination diagnosis, while requiring substantially less GPU memory. These results highlight its effectiveness and practicality for advancing the interpretability of MLLMs. The code will be released at https://ruoyuchen10.github.io/EAGLE/.

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation

TL;DR

This work introduces EAGLE, a black-box attribution framework for autoregressive token generation in multimodal large language models (MLLMs). EAGLE localizes generated tokens to compact perceptual regions and quantifies the influence of language priors versus perceptual evidence using a unified objective that combines an insight score with a necessity score, optimized via greedy search over a sparsified image partition, yielding low memory usage. The approach offers a token-agnostic attribution mechanism, formalizes a weak submodularity-based guarantee for its greedy optimization, and provides a modality-aware analysis that yields fine-grained interpretability of decisions and hallucination causes. Extensive experiments across MS COCO, MMVP, and RePOPE on LLaVA-1.5, Qwen2.5-VL, and InternVL3.5 demonstrate state-of-the-art faithfulness, localization, and hallucination diagnosis, with substantial efficiency gains and practical utility for debugging and safety in MLLMs.

Abstract

Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual regions while quantifying the relative influence of language priors and perceptual evidence. The framework introduces an objective function that unifies sufficiency (insight score) and indispensability (necessity score), optimized via greedy search over sparsified image regions for faithful and efficient attribution. Beyond spatial attribution, EAGLE performs modality-aware analysis that disentangles what tokens rely on, providing fine-grained interpretability of model decisions. Extensive experiments across open-source MLLMs show that EAGLE consistently outperforms existing methods in faithfulness, localization, and hallucination diagnosis, while requiring substantially less GPU memory. These results highlight its effectiveness and practicality for advancing the interpretability of MLLMs. The code will be released at https://ruoyuchen10.github.io/EAGLE/.

Paper Structure

This paper contains 21 sections, 7 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: Eagle attribution which perceptual regions drive the generation (Where MLLMs Attend) and quantifies modality reliance (What They Rely On).
  • Figure 2: Overview of the proposed Eagle framework. The input image is first sparsified into sub-regions, then attributed via greedy search with the designed objective, and finally analyzed for modality relevance between language priors and perceptual evidence.
  • Figure 3: Visualization of explanation results for LLaVA-1.5, Qwen2.5-VL, and InternVL3.5 on the MS COCO and MMVP datasets.
  • Figure 4: Visualization of word-level explanation results for LLaVA-1.5, Qwen2.5-VL, and InternVL3.5 on the MS COCO datasets.
  • Figure 5: Hallucination attribution on RePOPE. Our method produces sparse, focused maps that more accurately reveal regions responsible for hallucinated outputs, compared with IGOS++ and TAM.
  • ...and 12 more figures

Theorems & Definitions (4)

  • Remark 1: Weak Submodularity
  • Remark 2: Token-Agnostic Attribution
  • Remark 3: Interactive Token-Level Explanation
  • Remark 4: Computational Complexity