Table of Contents
Fetching ...

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking

Yixuan Qiao, Shanshan Zhao, Jun Wang, Hao Chen, Tuozhen Liu, Xianbin Ye, Xin Tang, Rui Fang, Peng Gao, Wenfeng Xie, Guotong Xie

TL;DR

The paper tackles improving passage and document ranking in the TREC 2021 Deep Learning Track by merging sparse and dense retrieval within a multi-stage ranking and ensembling framework. It extends the 2020 approach by incorporating a generative model (T5) and leveraging large-scale pre-trained re-rankers trained with Megatron-LM, with a focus on both top-ranked and overall ranking quality. Key contributions include using docT5query for sparse retrieval, ColBERT for dense retrieval, pairwise loss in the second-stage ranking, and the integration of T5-11B with substantial training resources, achieving notable top-rank gains in early positions. The findings demonstrate that a multi-stage, generative-augmented, and ensemble-driven pipeline can yield competitive results on large-scale IR tasks, guiding future work on interleaving diverse methods for enhanced retrieval performance.

Abstract

This paper describes the PASH participation in TREC 2021 Deep Learning Track. In the recall stage, we adopt a scheme combining sparse and dense retrieval method. In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used one after another based on model continual pre-trained on general knowledge and document-level data. Compared to TREC 2020 Deep Learning Track, we have additionally introduced the generative model T5 to further enhance the performance.

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking

TL;DR

The paper tackles improving passage and document ranking in the TREC 2021 Deep Learning Track by merging sparse and dense retrieval within a multi-stage ranking and ensembling framework. It extends the 2020 approach by incorporating a generative model (T5) and leveraging large-scale pre-trained re-rankers trained with Megatron-LM, with a focus on both top-ranked and overall ranking quality. Key contributions include using docT5query for sparse retrieval, ColBERT for dense retrieval, pairwise loss in the second-stage ranking, and the integration of T5-11B with substantial training resources, achieving notable top-rank gains in early positions. The findings demonstrate that a multi-stage, generative-augmented, and ensemble-driven pipeline can yield competitive results on large-scale IR tasks, guiding future work on interleaving diverse methods for enhanced retrieval performance.

Abstract

This paper describes the PASH participation in TREC 2021 Deep Learning Track. In the recall stage, we adopt a scheme combining sparse and dense retrieval method. In the multi-stage ranking phase, point-wise and pair-wise ranking strategies are used one after another based on model continual pre-trained on general knowledge and document-level data. Compared to TREC 2020 Deep Learning Track, we have additionally introduced the generative model T5 to further enhance the performance.
Paper Structure (9 sections, 4 tables)