MechIR: A Mechanistic Interpretability Framework for Information Retrieval
Andrew Parry, Catherine Chen, Carsten Eickhoff, Sean MacAvaney
TL;DR
MechIR introduces a mechanistic interpretability framework tailored for neural information retrieval, enabling causal analysis and interventions in IR architectures such as bi-encoders and cross-encoders. Building on activation patching and TransformerLens, it provides an end-to-end, open-source Python package that accesses activations, patches components, and creates paired perturbed datasets for IR tasks. The framework is complemented by demonstrations and tutorials, illustrating how perturbations reveal component-specific influences on relevance and guiding researchers through end-to-end experiments. By enabling diagnostic tooling, MechIR aims to improve transparency, robustness, and controllability of IR systems, with potential applications in bias mitigation, adversarial defense, and personalized retrieval.
Abstract
Mechanistic interpretability is an emerging diagnostic approach for neural models that has gained traction in broader natural language processing domains. This paradigm aims to provide attribution to components of neural systems where causal relationships between hidden layers and output were previously uninterpretable. As the use of neural models in IR for retrieval and evaluation becomes ubiquitous, we need to ensure that we can interpret why a model produces a given output for both transparency and the betterment of systems. This work comprises a flexible framework for diagnostic analysis and intervention within these highly parametric neural systems specifically tailored for IR tasks and architectures. In providing such a framework, we look to facilitate further research in interpretable IR with a broader scope for practical interventions derived from mechanistic interpretability. We provide preliminary analysis and look to demonstrate our framework through an axiomatic lens to show its applications and ease of use for those IR practitioners inexperienced in this emerging paradigm.
