Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Shengjie Ma; Qi Chu; Jiaxin Mao; Xuhui Jiang; Haozhe Duan; Chong Chen

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Shengjie Ma, Qi Chu, Jiaxin Mao, Xuhui Jiang, Haozhe Duan, Chong Chen

TL;DR

This work tackles the challenge of reliable and interpretable relevance judgments for legal case retrieval by leveraging a general LLM in a carefully designed, few-shot workflow that mirrors human expert reasoning. It decomposes relevance judgments into Material Facts and Legal Facts, using Adaptive Demo-Matching, Fact Extraction, and Fact Annotation to produce expert-aligned labels that are interpretable. Empirical results on the Chinese LeCaRD dataset show high agreement with human judgments, especially for Legal Facts, and demonstrate that label-only synthetic data can meaningfully boost downstream retrieval models and enable knowledge distillation to smaller LLMs. The approach offers a scalable, data-efficient path to improving legal case retrieval while preserving interpretability, with promising generalization to other legal domains and languages through targeted demonstrations and prompts.

Abstract

Determining which legal cases are relevant to a given query involves navigating lengthy texts and applying nuanced legal reasoning. Traditionally, this task has demanded significant time and domain expertise to identify key Legal Facts and reach sound juridical conclusions. In addition, existing data with legal case similarities often lack interpretability, making it difficult to understand the rationale behind relevance judgments. With the growing capabilities of large language models (LLMs), researchers have begun investigating their potential in this domain. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval remains largely unexplored. To address this gap in research, we propose a novel few-shot approach where LLMs assist in generating expert-aligned interpretable relevance judgments. The proposed approach decomposes the judgment process into several stages, mimicking the workflow of human annotators and allowing for the flexible incorporation of expert reasoning to improve the accuracy of relevance judgments. Importantly, it also ensures interpretable data labeling, providing transparency and clarity in the relevance assessment process. Through a comparison of relevance judgments made by LLMs and human experts, we empirically demonstrate that the proposed approach can yield reliable and valid relevance assessments. Furthermore, we demonstrate that with minimal expert supervision, our approach enables a large language model to acquire case analysis expertise and subsequently transfers this ability to a smaller model via annotation-based knowledge distillation.

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 12 figures, 4 tables)

This paper contains 20 sections, 2 equations, 12 figures, 4 tables.

Introduction
Related works
Pre-trained Language Models (PLMs) in Legal case retrieval
LLM Agents for Data Augmentation
Methodology
Preliminary
Automated Relevance Judgments for Legal Cases
Other Statements
Utility of the Relevance Judgments
Experiment
Datasets and Evaluation Metrics
Implementation Details
Evaluation of the Relevance Judgments (RQ1 & 2)
Data Augmentation Experiments (RQ3)
Baselines
...and 5 more sections

Figures (12)

Figure 1: An example of a challenge of legal relevance judgments.
Figure 2: The proposed workflow of data judgments for legal case retrieval task. To be clear, please note that Adaptive Demo-Matching (ADM) is applied twice before Fact Extraction (FE) and Fact Annotation (FA).
Figure 3: An example of data format in LeCaRD. Note that the original language is Chinese and the English texts are translations.
Figure 4: An example of demonstration used in Fact Extraction (FE) for extracting Material Facts (MF). (Translation of Chinese texts.)
Figure 5: An example of demonstration used in Fact Extraction (FE) for extracting Legal Facts (LF). (Translation of Chinese texts.)
...and 7 more figures

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

TL;DR

Abstract

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (12)