Table of Contents
Fetching ...

Improve Dense Passage Retrieval with Entailment Tuning

Lu Dai, Hao Liu, Hui Xiong

TL;DR

A method called entailment tuning is designed to improve the embedding of dense retrievers by unifying the form of retrieval data and NLI data using existence claim as a bridge.

Abstract

Retrieval module can be plugged into many downstream NLP tasks to improve their performance, such as open-domain question answering and retrieval-augmented generation. The key to a retrieval system is to calculate relevance scores to query and passage pairs. However, the definition of relevance is often ambiguous. We observed that a major class of relevance aligns with the concept of entailment in NLI tasks. Based on this observation, we designed a method called entailment tuning to improve the embedding of dense retrievers. Specifically, we unify the form of retrieval data and NLI data using existence claim as a bridge. Then, we train retrievers to predict the claims entailed in a passage with a variant task of masked prediction. Our method can be efficiently plugged into current dense retrieval methods, and experiments show the effectiveness of our method.

Improve Dense Passage Retrieval with Entailment Tuning

TL;DR

A method called entailment tuning is designed to improve the embedding of dense retrievers by unifying the form of retrieval data and NLI data using existence claim as a bridge.

Abstract

Retrieval module can be plugged into many downstream NLP tasks to improve their performance, such as open-domain question answering and retrieval-augmented generation. The key to a retrieval system is to calculate relevance scores to query and passage pairs. However, the definition of relevance is often ambiguous. We observed that a major class of relevance aligns with the concept of entailment in NLI tasks. Based on this observation, we designed a method called entailment tuning to improve the embedding of dense retrievers. Specifically, we unify the form of retrieval data and NLI data using existence claim as a bridge. Then, we train retrievers to predict the claims entailed in a passage with a variant task of masked prediction. Our method can be efficiently plugged into current dense retrieval methods, and experiments show the effectiveness of our method.

Paper Structure

This paper contains 20 sections, 8 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Both passages contains answer and receive high relevance score, but only the second is truly helpful to deduce answer. A necessary condition of a helpful passage is entailing the claim underlying the question.
  • Figure 2: NLI model has a clear tendency to classify the relationship between possitive passage and query as entailment, compared to negative passages and query.
  • Figure 3: Dense retriever can discern sentence pairs of different semantic relationships, shown by separate relevance score range, especially entail and irrelevant, but still has some difficulty between entail and neutral.
  • Figure 4: Pairwise Comparison by GPT-4. Our method wins over or tie with baselines in general quality.