Table of Contents
Fetching ...

Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models

Minghan Li, Eric Gaussier, Guodong Zhou

TL;DR

This work tackles long-document retrieval by replacing a single coarse embedding with fine-grained block representations (BReps) derived from decoder-only LLMs. It segments documents into blocks, embeds each block, and scores a query against blocks via cosine similarity, aggregating the top-$k$ signals with a descending-weight scheme; training uses a pairwise hinge loss with margin $m$ and LoRA for efficiency. Empirically, BReps outperforms RepLLaMA and strong baselines on DL, Robust04, and MLDR (English and Chinese) while reducing embedding latency, with further gains from more blocks and larger LLMs. This approach demonstrates that finer-grained interactions can capture nuanced relevance signals in long texts, offering practical improvements for retrieval systems using LLMs. The findings invite future work on alternative segmentations, loss functions, and architectural variants to further boost effectiveness and efficiency in long-document IR.

Abstract

In recent years, large language models (LLMs) have demonstrated exceptional power in various domains, including information retrieval. Most of the previous practices involve leveraging these models to create a single embedding for each query, each passage, or each document individually, a strategy exemplified and used by the Retrieval-Augmented Generation (RAG) framework. While this method has proven effective, we argue that it falls short in fully capturing the nuanced intricacies of document-level texts due to its reliance on a relatively coarse-grained representation. To address this limitation, we introduce a novel, fine-grained approach aimed at enhancing the accuracy of relevance scoring for long documents. Our methodology firstly segments a long document into blocks, each of which is embedded using an LLM, for matching with the query representation. When calculating the relevance score, we aggregate the query-block relevance scores through a weighted sum method, yielding a comprehensive score for the query with the entire document. Despite its apparent simplicity, our experimental findings reveal that this approach outperforms standard representation methods and achieves a significant reduction in embedding generation latency. Moreover, by carefully optimizing pairwise loss functions, superior performances have been achieved.

Enhanced Retrieval of Long Documents: Leveraging Fine-Grained Block Representations with Large Language Models

TL;DR

This work tackles long-document retrieval by replacing a single coarse embedding with fine-grained block representations (BReps) derived from decoder-only LLMs. It segments documents into blocks, embeds each block, and scores a query against blocks via cosine similarity, aggregating the top- signals with a descending-weight scheme; training uses a pairwise hinge loss with margin and LoRA for efficiency. Empirically, BReps outperforms RepLLaMA and strong baselines on DL, Robust04, and MLDR (English and Chinese) while reducing embedding latency, with further gains from more blocks and larger LLMs. This approach demonstrates that finer-grained interactions can capture nuanced relevance signals in long texts, offering practical improvements for retrieval systems using LLMs. The findings invite future work on alternative segmentations, loss functions, and architectural variants to further boost effectiveness and efficiency in long-document IR.

Abstract

In recent years, large language models (LLMs) have demonstrated exceptional power in various domains, including information retrieval. Most of the previous practices involve leveraging these models to create a single embedding for each query, each passage, or each document individually, a strategy exemplified and used by the Retrieval-Augmented Generation (RAG) framework. While this method has proven effective, we argue that it falls short in fully capturing the nuanced intricacies of document-level texts due to its reliance on a relatively coarse-grained representation. To address this limitation, we introduce a novel, fine-grained approach aimed at enhancing the accuracy of relevance scoring for long documents. Our methodology firstly segments a long document into blocks, each of which is embedded using an LLM, for matching with the query representation. When calculating the relevance score, we aggregate the query-block relevance scores through a weighted sum method, yielding a comprehensive score for the query with the entire document. Despite its apparent simplicity, our experimental findings reveal that this approach outperforms standard representation methods and achieves a significant reduction in embedding generation latency. Moreover, by carefully optimizing pairwise loss functions, superior performances have been achieved.

Paper Structure

This paper contains 27 sections, 7 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: BReps relevance scoring process for long document retrieval.
  • Figure 2: Train BReps with pairwise hinge loss.
  • Figure 3: Different models' time used for generating representations for TREC 2019 DL on one V100-32G GPU.
  • Figure 4: Performance comparison using different hinge loss margins $m$ and RankNet loss with BReps.
  • Figure 5: Performace comparison using different first $n$ blocks during inference with BReps.
  • ...and 3 more figures