LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models
Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Elena Voita
TL;DR
The paper addresses the challenge of understanding transformer language models in high-stakes settings by proposing LM Transparency Tool (LM-TT), an open-source interactive framework that renders the entire prediction process transparent. It leverages the Information Flow Subgraph approach, grounded in ferrando_voita2024routes, to identify the crucial components and provides fine-grained attributions to individual attention heads and FFN neurons, complemented by logit-lens visualizations and vocabulary projections. Key contributions include end-to-end transparency of computations, efficient extraction of task-relevant components, and an interactive UI that supports programmatic use and model extensibility via TransformerLens. This tool significantly lowers the barrier to hypothesis generation and verification for large models, enabling practitioners to inspect biases, safety-related reasoning routes, and factuality concerns with greater speed and precision.
Abstract
We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Differently from previously existing tools that focus on isolated parts of the decision-making process, our framework is designed to make the entire prediction process transparent, and allows tracing back model behavior from the top-layer representation to very fine-grained parts of the model. Specifically, it (1) shows the important part of the whole input-to-output information flow, (2) allows attributing any changes done by a model block to individual attention heads and feed-forward neurons, (3) allows interpreting the functions of those heads or neurons. A crucial part of this pipeline is showing the importance of specific model components at each step. As a result, we are able to look at the roles of model components only in cases where they are important for a prediction. Since knowing which components should be inspected is key for analyzing large models where the number of these components is extremely high, we believe our tool will greatly support the interpretability community both in research settings and in practical applications.
