IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval
Shounak Paul, Dhananjay Ghumare, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi
TL;DR
IL-PCSR introduces a unified Indian legal corpus for parallel statute and precedent retrieval, addressing the long-standing gap of interdependent retrieval tasks. The authors showcase a comprehensive retrieval framework combining lexical and semantic models, domain-specific event/GNN representations, and a two-stage LLM re-ranking strategy that exploits the mutual information between statutes and precedents. Key findings reveal that ensembles excel as first-stage retrievers, while LLM re-ranking delivers state-of-the-art performance, with cross-task conditioning in Stage-2 yielding additional gains. The work also provides an annotation study to ground relevance judgments in practice and demonstrates the practicality of transfer learning over multi-task training. Overall, IL-PCSR offers a valuable resource and methodological blueprint for joint legal retrieval with potential impact on legal analytics and decision-support systems.
Abstract
Identifying/retrieving relevant statutes and prior cases/precedents for a given legal situation are common tasks exercised by law practitioners. Researchers to date have addressed the two tasks independently, thus developing completely different datasets and models for each task; however, both retrieval tasks are inherently related, e.g., similar cases tend to cite similar statutes (due to similar factual situation). In this paper, we address this gap. We propose IL-PCR (Indian Legal corpus for Prior Case and Statute Retrieval), which is a unique corpus that provides a common testbed for developing models for both the tasks (Statute Retrieval and Precedent Retrieval) that can exploit the dependence between the two. We experiment extensively with several baseline models on the tasks, including lexical models, semantic models and ensemble based on GNNs. Further, to exploit the dependence between the two tasks, we develop an LLM-based re-ranking approach that gives the best performance.
