Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Yaocong Li; Qiang Lan; Leihan Zhang; Le Zhang

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Yaocong Li, Qiang Lan, Leihan Zhang, Le Zhang

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising technology for legal document consultation, yet its application in Chinese legal scenarios faces two key limitations: existing benchmarks lack specialized support for joint retriever-generator evaluation, and mainstream RAG systems often fail to accommodate the structured nature of legal provisions. To address these gaps, this study advances two core contributions: First, we constructed the Legal-DC benchmark dataset, comprising 480 legal documents (covering areas such as market regulation and contract management) and 2,475 refined question-answer pairs, each annotated with clause-level references, filling the gap for specialized evaluation resources in Chinese legal RAG. Second, we propose the LegRAG framework, which integrates legal adaptive indexing (clause-boundary segmentation) with a dual-path self-reflection mechanism to ensure clause integrity while enhancing answer accuracy. Third, we introduce automated evaluation methods for large language models to meet the high-reliability demands of legal retrieval scenarios. LegRAG outperforms existing state-of-the-art methods by 1.3% to 5.6% across key evaluation metrics. This research provides a specialized benchmark, practical framework, and empirical insights to advance the development of Chinese legal RAG systems. Our code and data are available at https://github.com/legal-dc/Legal-DC.

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Abstract

Paper Structure (34 sections, 3 equations, 4 figures, 14 tables)

This paper contains 34 sections, 3 equations, 4 figures, 14 tables.

Introduction
Related Work
Retrieval-Augmented Generation
LLMs on Legal Domains
The Benchmark Dataset
Dataset Construction
Evaluation metrics
Metrics for the answer generation methods
Automatic Evaluation Protocol
Evaluation Pipeline
Prompt Template for Legal RAG Evaluation
Experimental Setup and Credibility Validation
Experimental Design
Formal Definition of the RAG Framework
Modular Implementation of LegRAG
...and 19 more sections

Figures (4)

Figure 1: Dataset construction process.
Figure 2: Core prompt snippet for automatic legal RAG evaluation.
Figure 3: The prompt for answer generation.
Figure 4: Architecture of the LegRAG framework.

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Abstract

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Authors

Abstract

Table of Contents

Figures (4)