Table of Contents
Fetching ...

Towards Reliable Retrieval in RAG Systems for Large Legal Datasets

Markus Reuter, Tobias Lingenberg, Rūta Liepiņa, Francesca Lagioia, Marco Lippi, Giovanni Sartor, Andrea Passerini, Burcu Sayin

TL;DR

The paper addresses the critical reliability issue in retrieval-augmented generation (RAG) for large legal datasets by defining Document-Level Retrieval Mismatch (DRM), where retrieved chunks originate from the wrong document. It proposes Summary-Augmented Chunking (SAC), which prepends a document-level summary to each chunk to preserve global context and guide retrieval. Experiments on LegalBench-RAG show SAC substantially reduces DRM and improves text-level precision/recall, with generic summaries outperforming expert-guided ones for retrieval. The work offers a practical, scalable enhancement to legal RAG systems, advancing reliable AI-assisted legal analysis while highlighting that broader, generic summaries can outperform more domain-tailored prompts in this context.

Abstract

Retrieval-Augmented Generation (RAG) is a promising approach to mitigate hallucinations in Large Language Models (LLMs) for legal applications, but its reliability is critically dependent on the accuracy of the retrieval step. This is particularly challenging in the legal domain, where large databases of structurally similar documents often cause retrieval systems to fail. In this paper, we address this challenge by first identifying and quantifying a critical failure mode we term Document-Level Retrieval Mismatch (DRM), where the retriever selects information from entirely incorrect source documents. To mitigate DRM, we investigate a simple and computationally efficient technique which we refer to as Summary-Augmented Chunking (SAC). This method enhances each text chunk with a document-level synthetic summary, thereby injecting crucial global context that would otherwise be lost during a standard chunking process. Our experiments on a diverse set of legal information retrieval tasks show that SAC greatly reduces DRM and, consequently, also improves text-level retrieval precision and recall. Interestingly, we find that a generic summarization strategy outperforms an approach that incorporates legal expert domain knowledge to target specific legal elements. Our work provides evidence that this practical, scalable, and easily integrable technique enhances the reliability of RAG systems when applied to large-scale legal document datasets.

Towards Reliable Retrieval in RAG Systems for Large Legal Datasets

TL;DR

The paper addresses the critical reliability issue in retrieval-augmented generation (RAG) for large legal datasets by defining Document-Level Retrieval Mismatch (DRM), where retrieved chunks originate from the wrong document. It proposes Summary-Augmented Chunking (SAC), which prepends a document-level summary to each chunk to preserve global context and guide retrieval. Experiments on LegalBench-RAG show SAC substantially reduces DRM and improves text-level precision/recall, with generic summaries outperforming expert-guided ones for retrieval. The work offers a practical, scalable enhancement to legal RAG systems, advancing reliable AI-assisted legal analysis while highlighting that broader, generic summaries can outperform more domain-tailored prompts in this context.

Abstract

Retrieval-Augmented Generation (RAG) is a promising approach to mitigate hallucinations in Large Language Models (LLMs) for legal applications, but its reliability is critically dependent on the accuracy of the retrieval step. This is particularly challenging in the legal domain, where large databases of structurally similar documents often cause retrieval systems to fail. In this paper, we address this challenge by first identifying and quantifying a critical failure mode we term Document-Level Retrieval Mismatch (DRM), where the retriever selects information from entirely incorrect source documents. To mitigate DRM, we investigate a simple and computationally efficient technique which we refer to as Summary-Augmented Chunking (SAC). This method enhances each text chunk with a document-level synthetic summary, thereby injecting crucial global context that would otherwise be lost during a standard chunking process. Our experiments on a diverse set of legal information retrieval tasks show that SAC greatly reduces DRM and, consequently, also improves text-level retrieval precision and recall. Interestingly, we find that a generic summarization strategy outperforms an approach that incorporates legal expert domain knowledge to target specific legal elements. Our work provides evidence that this practical, scalable, and easily integrable technique enhances the reliability of RAG systems when applied to large-scale legal document datasets.

Paper Structure

This paper contains 21 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Part a) illustrates how our retrieval quality metrics, Document-Level Retrieval Mismatch (DRM) and text-level precision/recall, are computed in the LegalBench-RAG pipitone2024legalbench information retrieval task. Part b) shows the process of setting up the knowledge base using Summary Augmented Chunks (SAC).
  • Figure 2: Document-Level Retrieval Mismatch (DRM) of a standard RAG approach (left) and using our Summary Augmented Chunking (right), applied to the 4 datasets in the LegalBench-RAG benchmark. Retrieval using SAC selects fewer wrong documents across all top-k retrieved snippets and seeds.
  • Figure 3: Text-level precision (left) and recall (right) of the standard RAG approach and SAC with general or expert-guided summarization strategy. The metrics are averaged over all datasets and seeds.
  • Figure 4: Relative performance comparison of four embedding models in the baseline case on the LegalBench-RAG dataset pipitone2024legalbench.