Table of Contents
Fetching ...

Distillation Enhanced Generative Retrieval

Yongqi Li, Zhen Zhang, Wenjie Wang, Liqiang Nie, Wenjie Li, Tat-Seng Chua

TL;DR

Distillation Enhanced Generative Retrieval (DGR) introduces a teacher–student framework to improve generative retrieval by distilling graded passage rankings from a powerful teacher into a generative retriever. It introduces a distilled RankNet loss that leverages teacher ranking orders and demonstrates strong, robust gains across NQ, TriviaQA, MSMARCO, and TREC DL while keeping inference unchanged. The approach shows that knowledge distillation can close the gap to dense retrieval within the generative paradigm and remains effective across different teacher architectures and distillation losses. The work points to future directions such as longer teacher rankings and more nuanced sampling strategies to further boost performance.

Abstract

Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.

Distillation Enhanced Generative Retrieval

TL;DR

Distillation Enhanced Generative Retrieval (DGR) introduces a teacher–student framework to improve generative retrieval by distilling graded passage rankings from a powerful teacher into a generative retriever. It introduces a distilled RankNet loss that leverages teacher ranking orders and demonstrates strong, robust gains across NQ, TriviaQA, MSMARCO, and TREC DL while keeping inference unchanged. The approach shows that knowledge distillation can close the gap to dense retrieval within the generative paradigm and remains effective across different teacher architectures and distillation losses. The work points to future directions such as longer teacher rankings and more nuanced sampling strategies to further boost performance.

Abstract

Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.
Paper Structure (18 sections, 3 equations, 2 figures, 6 tables)

This paper contains 18 sections, 3 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The illustration of distillation enhanced generative retrieval (DGR) framework. Sophisticated ranking models serve as teacher models to rerank the passages, and the custom-designed distilled RankNet loss is utilized to optimize the generative retrieval model.
  • Figure 2: Retrieval performances of DGR on the NQ test set are depicted in (a) and (b) with respect to the incremental margin values $m_{gap}$ and the number of passages, $M$, in $\mathcal{R}{tea}$.