CitaLaw: Enhancing LLM with Citations in Legal Domain
Kepu Zhang, Weijie Yu, Sunhao Dai, Jun Xu
TL;DR
CitaLaw introduces a legal-citation benchmark that evaluates LLMs on generating legally grounded responses with context-aware citations, using two audience-specific subsets (laypersons and practitioners). It couples a retrieval-augmented generation framework with a novel syllogism-based evaluation that links major premises (law articles or precedents), minor premises (case circumstances), and conclusions (legal decisions). The corpus combines law articles and precedents (≈500k documents) and supports two generation strategies—CGG and ARG—demonstrating that explicit references improve response quality and that syllogism-based metrics align with human judgments. The work provides practical guidance for deploying trustworthy legal LLMs and highlights the nuanced performance differences between open-domain and legal-domain models across retrieval and NLI configurations.
Abstract
In this paper, we propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations. CitaLaw features a diverse set of legal questions for both laypersons and practitioners, paired with a comprehensive corpus of law articles and precedent cases as a reference pool. This framework enables LLM-based systems to retrieve supporting citations from the reference corpus and align these citations with the corresponding sentences in their responses. Moreover, we introduce syllogism-inspired evaluation methods to assess the legal alignment between retrieved references and LLM-generated responses, as well as their consistency with user questions. Extensive experiments on 2 open-domain and 7 legal-specific LLMs demonstrate that integrating legal references substantially enhances response quality. Furthermore, our proposed syllogism-based evaluation method exhibits strong agreement with human judgments.
