Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Ruoyu Chen, Xiaoqing Guo, Kangwei Liu, Siyuan Liang, Shiming Liu, Qunli Zhang, Laiyuan Wang, Hua Zhang, Xiaochun Cao
TL;DR
This work introduces EAGLE, a black-box attribution framework for autoregressive token generation in multimodal large language models (MLLMs). EAGLE localizes generated tokens to compact perceptual regions and quantifies the influence of language priors versus perceptual evidence using a unified objective that combines an insight score with a necessity score, optimized via greedy search over a sparsified image partition, yielding low memory usage. The approach offers a token-agnostic attribution mechanism, formalizes a weak submodularity-based guarantee for its greedy optimization, and provides a modality-aware analysis that yields fine-grained interpretability of decisions and hallucination causes. Extensive experiments across MS COCO, MMVP, and RePOPE on LLaVA-1.5, Qwen2.5-VL, and InternVL3.5 demonstrate state-of-the-art faithfulness, localization, and hallucination diagnosis, with substantial efficiency gains and practical utility for debugging and safety in MLLMs.
Abstract
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual regions while quantifying the relative influence of language priors and perceptual evidence. The framework introduces an objective function that unifies sufficiency (insight score) and indispensability (necessity score), optimized via greedy search over sparsified image regions for faithful and efficient attribution. Beyond spatial attribution, EAGLE performs modality-aware analysis that disentangles what tokens rely on, providing fine-grained interpretability of model decisions. Extensive experiments across open-source MLLMs show that EAGLE consistently outperforms existing methods in faithfulness, localization, and hallucination diagnosis, while requiring substantially less GPU memory. These results highlight its effectiveness and practicality for advancing the interpretability of MLLMs. The code will be released at https://ruoyuchen10.github.io/EAGLE/.
