ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation
Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas Afzali, Sajjad Amini, Amir Houmansadr
TL;DR
ULTra addresses the challenge of interpreting latent tokens in Transformer-based understanding and segmentation by introducing a framework that backpropagates a scalar function of a target latent token through attention to produce token-specific explanation maps defined as $\overline{S}_i^{(l)}= \mathbf{C}_{i}^{(1,l)} \cdot \cdots \cdot \mathbf{C}_{i}^{(l-1,l)}$ with $S_i^{(l)} = \overline{S}_i^{(l)}[i, 1: ]$. It enables unsupervised semantic segmentation using pre-trained ViTs without fine-tuning, and further improves performance with a lightweight self-consistency learnable transformation $\mathbf{W}$ via a dedicated loss $\mathcal{L}_{\text{sc}}$. The approach is validated across vision and language tasks, achieving state-of-the-art results on multiple segmentation benchmarks and demonstrating interpretability in LLM text summarization through token-contribution analysis and a Comprehensiveness metric. While offering broad, zero-shot applicability and architectural faithfulness, the work notes the computational cost of gradient-based explanations and points to future efficiency and scalability enhancements.
Abstract
Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.
