Table of Contents
Fetching ...

Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP

Ziyi Chen, Jize Jiang, Daqian Zuo, Heyi Tao, Jun Yang, Yuxiang Wei

TL;DR

The Efficient Title Reranker via Broadcasting Query Encoder is introduced, a novel technique for title reranking that achieves a 20x-40x speedup over the vanilla passage reranker and Sigmoid Trick, a novel loss function customized for title reranking are introduced.

Abstract

In recent RAG approaches, rerankers play a pivotal role in refining retrieval accuracy with the ability of revealing logical relations for each pair of query and text. However, existing rerankers are required to repeatedly encode the query and a large number of long retrieved text. This results in high computational costs and limits the number of retrieved text, hindering accuracy. As a remedy of the problem, we introduce the Efficient Title Reranker via Broadcasting Query Encoder, a novel technique for title reranking that achieves a 20x-40x speedup over the vanilla passage reranker. Furthermore, we introduce Sigmoid Trick, a novel loss function customized for title reranking. Combining both techniques, we empirically validated their effectiveness, achieving state-of-the-art results on all four datasets we experimented with from the KILT knowledge benchmark.

Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP

TL;DR

The Efficient Title Reranker via Broadcasting Query Encoder is introduced, a novel technique for title reranking that achieves a 20x-40x speedup over the vanilla passage reranker and Sigmoid Trick, a novel loss function customized for title reranking are introduced.

Abstract

In recent RAG approaches, rerankers play a pivotal role in refining retrieval accuracy with the ability of revealing logical relations for each pair of query and text. However, existing rerankers are required to repeatedly encode the query and a large number of long retrieved text. This results in high computational costs and limits the number of retrieved text, hindering accuracy. As a remedy of the problem, we introduce the Efficient Title Reranker via Broadcasting Query Encoder, a novel technique for title reranking that achieves a 20x-40x speedup over the vanilla passage reranker. Furthermore, we introduce Sigmoid Trick, a novel loss function customized for title reranking. Combining both techniques, we empirically validated their effectiveness, achieving state-of-the-art results on all four datasets we experimented with from the KILT knowledge benchmark.
Paper Structure (15 sections, 4 equations, 6 figures, 3 tables)

This paper contains 15 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A comparison of throughput of different methods on a single Nvidia A40. Our method (ETR) via Broadcasting Query Encoder (BQE) has a throughput improvement of 20x to 40x compared to vanilla passage rerankers, and at least 3x throughput improvement over normal title reranker. Best viewed in color.
  • Figure 2: Illustration of the two undesirable training scenarios. In the first case,'Eearl Derr Biggers' is been correctly predicted, but this example is trivial. Overtraining on this example will make the model only use semantical similarity and fail to handle more difficult queries. In the second case, the prediction failed to include 'Benny Hill', the ground truth label. However, since there is no information from the query for the model to judge whether the ground truth is correct. Examples like such become a "noise" during the training process. Best viewed in color.
  • Figure 3: A comparison between monoReranker and ETR. ETR only encoded the query once to score for multiple texts while monoReranker will need to encode the query as many times as the number of the text. The reduced number of encoding of the query in ETR is achieved via attention manipulation that alloww each title to be scored individually while being encoded in one model run.
  • Figure 4: A plot of $y = S(5 (x - 0.5))$ and its derivative where $S$ being the sigmoid funtion. The derivative of the sigmoid function will decrease as it gets further away from the center. The red and yellow region shows the area that receives at least 0.8x of gradient update and at most 0.8x of gradient update.
  • Figure 5: The recall for ETR on Wizard of Wikipedia test by the number of retrieved documents.
  • ...and 1 more figures