Table of Contents
Fetching ...

Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin

TL;DR

This paper reframes document ranking as a generation task by adapting a pretrained sequence-to-sequence model (T5) to produce relevance labels as target words ('true'/'false'), interpreting their logits as relevance probabilities for ranking. Across MS MARCO and zero-shot Robust04 evaluation, the approach matches or exceeds encoder-only models, with larger T5 variants showing strong gains and data efficiency, particularly in low-data regimes. Target-word probing experiments reveal that the model leverages latent linguistic knowledge from pretraining to perform reranking, offering insights into how generation-based methods can outperform traditional classification-based reranking under certain data conditions. The work highlights the potential of seq2seq reranking for efficient, transferable document ranking and motivates further exploration of how pretraining objectives influence downstream ranking tasks.

Abstract

This work proposes a novel adaptation of a pretrained sequence-to-sequence model to the task of document ranking. Our approach is fundamentally different from a commonly-adopted classification-based formulation of ranking, based on encoder-only pretrained transformer architectures such as BERT. We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words", and how the underlying logits of these target words can be interpreted as relevance probabilities for ranking. On the popular MS MARCO passage ranking task, experimental results show that our approach is at least on par with previous classification-based models and can surpass them with larger, more-recent models. On the test collection from the TREC 2004 Robust Track, we demonstrate a zero-shot transfer-based approach that outperforms previous state-of-the-art models requiring in-dataset cross-validation. Furthermore, we find that our approach significantly outperforms an encoder-only model in a data-poor regime (i.e., with few training examples). We investigate this observation further by varying target words to probe the model's use of latent knowledge.

Document Ranking with a Pretrained Sequence-to-Sequence Model

TL;DR

This paper reframes document ranking as a generation task by adapting a pretrained sequence-to-sequence model (T5) to produce relevance labels as target words ('true'/'false'), interpreting their logits as relevance probabilities for ranking. Across MS MARCO and zero-shot Robust04 evaluation, the approach matches or exceeds encoder-only models, with larger T5 variants showing strong gains and data efficiency, particularly in low-data regimes. Target-word probing experiments reveal that the model leverages latent linguistic knowledge from pretraining to perform reranking, offering insights into how generation-based methods can outperform traditional classification-based reranking under certain data conditions. The work highlights the potential of seq2seq reranking for efficient, transferable document ranking and motivates further exploration of how pretraining objectives influence downstream ranking tasks.

Abstract

This work proposes a novel adaptation of a pretrained sequence-to-sequence model to the task of document ranking. Our approach is fundamentally different from a commonly-adopted classification-based formulation of ranking, based on encoder-only pretrained transformer architectures such as BERT. We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words", and how the underlying logits of these target words can be interpreted as relevance probabilities for ranking. On the popular MS MARCO passage ranking task, experimental results show that our approach is at least on par with previous classification-based models and can surpass them with larger, more-recent models. On the test collection from the TREC 2004 Robust Track, we demonstrate a zero-shot transfer-based approach that outperforms previous state-of-the-art models requiring in-dataset cross-validation. Furthermore, we find that our approach significantly outperforms an encoder-only model in a data-poor regime (i.e., with few training examples). We investigate this observation further by varying target words to probe the model's use of latent knowledge.

Paper Structure

This paper contains 12 sections, 1 equation, 4 tables.