Table of Contents
Fetching ...

TWOLAR: a TWO-step LLM-Augmented distillation method for passage Reranking

Davide Baldelli, Junfeng Jiang, Akiko Aizawa, Paolo Torroni

TL;DR

TWOLAR tackles the efficiency gap in passage reranking by distilling the knowledge of large language models into a compact student model through a two-step, LLM-augmented process. It builds a diverse training set of 20K synthetic queries using query augmentation (sentence cropping and docT5query) and multiple retrieval signals, then uses an LLM to generate high-quality reranking annotations that guide distillation. The approach achieves state-of-the-art-like performance on TREC-DL2019/2020 and BEIR in zero-shot settings, matching or surpassing much larger models while being orders of magnitude smaller, and it openly releases data, finetuned models, and code. These results demonstrate the practicality of LLM-derived supervision for scalable, domain-robust passage reranking and offer a blueprint for further efficiency gains through larger LLMs or expanded supervision sources.

Abstract

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code.

TWOLAR: a TWO-step LLM-Augmented distillation method for passage Reranking

TL;DR

TWOLAR tackles the efficiency gap in passage reranking by distilling the knowledge of large language models into a compact student model through a two-step, LLM-augmented process. It builds a diverse training set of 20K synthetic queries using query augmentation (sentence cropping and docT5query) and multiple retrieval signals, then uses an LLM to generate high-quality reranking annotations that guide distillation. The approach achieves state-of-the-art-like performance on TREC-DL2019/2020 and BEIR in zero-shot settings, matching or surpassing much larger models while being orders of magnitude smaller, and it openly releases data, finetuned models, and code. These results demonstrate the practicality of LLM-derived supervision for scalable, domain-robust passage reranking and offer a blueprint for further efficiency gains through larger LLMs or expanded supervision sources.

Abstract

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code.
Paper Structure (23 sections, 2 equations, 2 figures, 5 tables)

This paper contains 23 sections, 2 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Illustration of the score strategies from monoT5, RankT5 and our proposed approach.
  • Figure 2: Illustration of the method used to build the dataset.