Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

Mathias Vast; Basile Van Cooten; Laure Soulier; Benjamin Piwowarski

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski

TL;DR

This work tackles interpretability in neural information retrieval (IR) by applying Neuron Integrated Gradients ($NIG$) to a cross-encoder IR model (MonoBERT) to identify neuron-level signals of relevance and handling of out-of-domain data. By aggregating neuron conductances across tokens and datasets, the authors reveal core sets of relevance-specific neurons that transfer across domains and distinct OOD neurons for out-of-domain cases, validating findings with targeted ablations. The study introduces fusion-based attribution schemes and demonstrates that pruning neurons identified by $NIG$ degrades IR performance, supporting the method's usefulness for understanding and potentially guiding IR model design. The results offer a path toward interpretable, more robust IR systems across diverse domains and encourage exploring broader architectures and mechanistic analyses.

Abstract

With the recent addition of Retrieval-Augmented Generation (RAG), the scope and importance of Information Retrieval (IR) has expanded. As a result, the importance of a deeper understanding of IR models also increases. However, interpretability in IR remains under-explored, especially when it comes to the models' inner mechanisms. In this paper, we explore the possibility of adapting Integrated Gradient-based methods in an IR context to identify the role of individual neurons within the model. In particular, we provide new insights into the role of what we call "relevance" neurons, as well as how they deal with unseen data. Finally, we carry out an in-depth pruning study to validate our findings.

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

TL;DR

This work tackles interpretability in neural information retrieval (IR) by applying Neuron Integrated Gradients (

) to a cross-encoder IR model (MonoBERT) to identify neuron-level signals of relevance and handling of out-of-domain data. By aggregating neuron conductances across tokens and datasets, the authors reveal core sets of relevance-specific neurons that transfer across domains and distinct OOD neurons for out-of-domain cases, validating findings with targeted ablations. The study introduces fusion-based attribution schemes and demonstrates that pruning neurons identified by

degrades IR performance, supporting the method's usefulness for understanding and potentially guiding IR model design. The results offer a path toward interpretable, more robust IR systems across diverse domains and encourage exploring broader architectures and mechanistic analyses.

Abstract

Paper Structure (23 sections, 1 equation, 4 figures, 4 tables)

This paper contains 23 sections, 1 equation, 4 figures, 4 tables.

Introduction
Related Works
Neuron Integrated Gradients for IR
Background
Adapting NIG to identify "task-related" neurons
Comparisons across datasets.
Dependency to the input.
Baseline.
Experiments
Experimental setup
Analysis methodology
Answering RQ1.
Answering RQ2.
Answering RQ3.
Result analysis
...and 8 more sections

Figures (4)

Figure 1: Illustration of the Neuron Integrated Gradient method
Figure 2: Percentage of intersection between pairs of attribution schemes for the label "relevant" (top) (resp. "non-relevant" (bottom)) at different percentages of pruning
Figure 3: Summary of the intersections amongst all the attribution schemes of every dataset (or all the datasets but Robust and NFCorpus) for a given label (either "relevant" or "non-relevant") and of the intersections between two attribution schemes (both label) for a single dataset
Figure 4: Distribution in the model of the most important neurons for the attribution scheme $N_{ms}$ (1% pruning).

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

TL;DR

Abstract

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

Authors

TL;DR

Abstract

Table of Contents

Figures (4)