Table of Contents
Fetching ...

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski

TL;DR

This work tackles interpretability in neural information retrieval (IR) by applying Neuron Integrated Gradients ($NIG$) to a cross-encoder IR model (MonoBERT) to identify neuron-level signals of relevance and handling of out-of-domain data. By aggregating neuron conductances across tokens and datasets, the authors reveal core sets of relevance-specific neurons that transfer across domains and distinct OOD neurons for out-of-domain cases, validating findings with targeted ablations. The study introduces fusion-based attribution schemes and demonstrates that pruning neurons identified by $NIG$ degrades IR performance, supporting the method's usefulness for understanding and potentially guiding IR model design. The results offer a path toward interpretable, more robust IR systems across diverse domains and encourage exploring broader architectures and mechanistic analyses.

Abstract

With the recent addition of Retrieval-Augmented Generation (RAG), the scope and importance of Information Retrieval (IR) has expanded. As a result, the importance of a deeper understanding of IR models also increases. However, interpretability in IR remains under-explored, especially when it comes to the models' inner mechanisms. In this paper, we explore the possibility of adapting Integrated Gradient-based methods in an IR context to identify the role of individual neurons within the model. In particular, we provide new insights into the role of what we call "relevance" neurons, as well as how they deal with unseen data. Finally, we carry out an in-depth pruning study to validate our findings.

Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders

TL;DR

This work tackles interpretability in neural information retrieval (IR) by applying Neuron Integrated Gradients () to a cross-encoder IR model (MonoBERT) to identify neuron-level signals of relevance and handling of out-of-domain data. By aggregating neuron conductances across tokens and datasets, the authors reveal core sets of relevance-specific neurons that transfer across domains and distinct OOD neurons for out-of-domain cases, validating findings with targeted ablations. The study introduces fusion-based attribution schemes and demonstrates that pruning neurons identified by degrades IR performance, supporting the method's usefulness for understanding and potentially guiding IR model design. The results offer a path toward interpretable, more robust IR systems across diverse domains and encourage exploring broader architectures and mechanistic analyses.

Abstract

With the recent addition of Retrieval-Augmented Generation (RAG), the scope and importance of Information Retrieval (IR) has expanded. As a result, the importance of a deeper understanding of IR models also increases. However, interpretability in IR remains under-explored, especially when it comes to the models' inner mechanisms. In this paper, we explore the possibility of adapting Integrated Gradient-based methods in an IR context to identify the role of individual neurons within the model. In particular, we provide new insights into the role of what we call "relevance" neurons, as well as how they deal with unseen data. Finally, we carry out an in-depth pruning study to validate our findings.
Paper Structure (23 sections, 1 equation, 4 figures, 4 tables)

This paper contains 23 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the Neuron Integrated Gradient method
  • Figure 2: Percentage of intersection between pairs of attribution schemes for the label "relevant" (top) (resp. "non-relevant" (bottom)) at different percentages of pruning
  • Figure 3: Summary of the intersections amongst all the attribution schemes of every dataset (or all the datasets but Robust and NFCorpus) for a given label (either "relevant" or "non-relevant") and of the intersections between two attribution schemes (both label) for a single dataset
  • Figure 4: Distribution in the model of the most important neurons for the attribution scheme $N_{ms}$ (1% pruning).