Table of Contents
Fetching ...

Reverse-Engineering the Retrieval Process in GenIR Models

Anja Reusch, Yonatan Belinkov

TL;DR

This work interrogates how Generative Information Retrieval (GenIR) models perform end-to-end retrieval by applying mechanistic interpretability methods. It shows that the decoder, not the encoder, primarily drives retrieval, and identifies a three-stage decoding process consisting of priming, bridging, and interaction, where cross-attention largely transfers query information and late-stage MLPs write the final document identifiers. Through activation patching, logit-lens analyses, and cross-model encoder swaps, the study demonstrates that significant retrieval behavior remains even when the encoder is not specialized for the target corpus, and that several mechanisms are largely learned during pretraining but are further adapted in Stage III for corpus-specific retrieval. The findings reveal that a small subset of components—notably Stage II/III cross-attention and Stage III MLPs—are critical for retrieval, suggesting opportunities for faster GenIR fine-tuning and inference by focusing on these components. Overall, the work advances understanding of GenIR internals, provides a roadmap for efficient training and deployment, and releases code and models to support further exploration.

Abstract

Generative Information Retrieval (GenIR) is a novel paradigm in which a transformer encoder-decoder model predicts document rankings based on a query in an end-to-end fashion. These GenIR models have received significant attention due to their simple retrieval architecture while maintaining high retrieval effectiveness. However, in contrast to established retrieval architectures like cross-encoders or bi-encoders, their internal computations remain largely unknown. Therefore, this work studies the internal retrieval process of GenIR models by applying methods based on mechanistic interpretability, such as patching and vocabulary projections. By replacing the GenIR encoder with one trained on fewer documents, we demonstrate that the decoder is the primary component responsible for successful retrieval. Our patching experiments reveal that not all components in the decoder are crucial for the retrieval process. More specifically, we find that a pass through the decoder can be divided into three stages: (I) the priming stage, which contributes important information for activating subsequent components in later layers; (II) the bridging stage, where cross-attention is primarily active to transfer query information from the encoder to the decoder; and (III) the interaction stage, where predominantly MLPs are active to predict the document identifier. Our findings indicate that interaction between query and document information occurs only in the last stage. We hope our results promote a better understanding of GenIR models and foster future research to overcome the current challenges associated with these models.

Reverse-Engineering the Retrieval Process in GenIR Models

TL;DR

This work interrogates how Generative Information Retrieval (GenIR) models perform end-to-end retrieval by applying mechanistic interpretability methods. It shows that the decoder, not the encoder, primarily drives retrieval, and identifies a three-stage decoding process consisting of priming, bridging, and interaction, where cross-attention largely transfers query information and late-stage MLPs write the final document identifiers. Through activation patching, logit-lens analyses, and cross-model encoder swaps, the study demonstrates that significant retrieval behavior remains even when the encoder is not specialized for the target corpus, and that several mechanisms are largely learned during pretraining but are further adapted in Stage III for corpus-specific retrieval. The findings reveal that a small subset of components—notably Stage II/III cross-attention and Stage III MLPs—are critical for retrieval, suggesting opportunities for faster GenIR fine-tuning and inference by focusing on these components. Overall, the work advances understanding of GenIR internals, provides a roadmap for efficient training and deployment, and releases code and models to support further exploration.

Abstract

Generative Information Retrieval (GenIR) is a novel paradigm in which a transformer encoder-decoder model predicts document rankings based on a query in an end-to-end fashion. These GenIR models have received significant attention due to their simple retrieval architecture while maintaining high retrieval effectiveness. However, in contrast to established retrieval architectures like cross-encoders or bi-encoders, their internal computations remain largely unknown. Therefore, this work studies the internal retrieval process of GenIR models by applying methods based on mechanistic interpretability, such as patching and vocabulary projections. By replacing the GenIR encoder with one trained on fewer documents, we demonstrate that the decoder is the primary component responsible for successful retrieval. Our patching experiments reveal that not all components in the decoder are crucial for the retrieval process. More specifically, we find that a pass through the decoder can be divided into three stages: (I) the priming stage, which contributes important information for activating subsequent components in later layers; (II) the bridging stage, where cross-attention is primarily active to transfer query information from the encoder to the decoder; and (III) the interaction stage, where predominantly MLPs are active to predict the document identifier. Our findings indicate that interaction between query and document information occurs only in the last stage. We hope our results promote a better understanding of GenIR models and foster future research to overcome the current challenges associated with these models.

Paper Structure

This paper contains 20 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A simplified view of the retrieval process in the Generative IR models in this work. After the encoder processes the query, the decoder operates in three stages: (I) the Priming Stage, where MLPs activate to prepare the residual stream triggering the cross-attention components in (II) the Bridging Stage, which transfer query information from the encoder to the decoder's residual stream; and (III) the Interaction Stage, in which MLPs process the query information from the cross-attention to adjust the logits, promoting relevant documents.
  • Figure 2: Overview of a transformer encoder-decoder for GenIR, depicted as the residual stream to which components read and write.
  • Figure 3: Proportion that each component's output contributes to the change in the residual stream (top) and cosine similarity of each component output with the layer output (bottom), displayed per layer. The models follow a similar trend: high MLP contribution in Stage I and III, cross-attention peaks in Stage II, the cosine similarity of all components is negative in Stage III.
  • Figure 4: Rank of the correct document after each layer, average rank of all document identifier tokens and non-document-identifier tokens after applying logitlens to the output of the layer (top), and rank of the correct document after applying logitlens to the indicated model component (bottom), displayed per layer. All models follow a similar trend (including NQ100k, omitted for space concerns): The models separate document identifiers and non-document-identifiers early, and gradually improve the rank of the relevant document. The cross-attention output does not seem to follow this gradual progression.
  • Figure 5: Components per stage that trigger cross-attention in Stage II and III (left) and activate MLPs in Stage III (right) of NQ10k. Stage III MLPs gets mostly activated from cross-attention in Stage II and III, while cross-attention in Stage II gets mostly activated by Stage I MLPs.