ECtHR-PCR: A Dataset for Precedent Understanding and Prior Case Retrieval in the European Court of Human Rights
T. Y. S. S Santosh, Rashid Gustav Haddad, Matthias Grabmair
TL;DR
The paper introduces ECtHR-PCR, a precedent understanding and prior case retrieval dataset built from European Court of Human Rights judgments, designed to reflect realistic retrieval by separating facts from arguments and using a full candidate pool. It benchmarks lexical (BM25) and dense retrieval models with hierarchical long-text encoders, exploring multiple negative sampling strategies and revealing temporal degradation in dense models alongside robust performance of BM25 over time. It also empirically compares Halsbury's and Goodhart's views on relevancy, finding that relying on the law section (aligned with Halsbury) yields stronger signals than relying on the facts alone. The resource enables future exploration of citation-network–aware retrieval and temporal adaptation to evolving legal standards, with practical implications for scalable, explainable PCR in legal NLP systems.
Abstract
In common law jurisdictions, legal practitioners rely on precedents to construct arguments, in line with the doctrine of \emph{stare decisis}. As the number of cases grow over the years, prior case retrieval (PCR) has garnered significant attention. Besides lacking real-world scale, existing PCR datasets do not simulate a realistic setting, because their queries use complete case documents while only masking references to prior cases. The query is thereby exposed to legal reasoning not yet available when constructing an argument for an undecided case as well as spurious patterns left behind by citation masks, potentially short-circuiting a comprehensive understanding of case facts and legal principles. To address these limitations, we introduce a PCR dataset based on judgements from the European Court of Human Rights (ECtHR), which explicitly separate facts from arguments and exhibit precedential practices, aiding us to develop this PCR dataset to foster systems' comprehensive understanding. We benchmark different lexical and dense retrieval approaches with various negative sampling strategies, adapting them to deal with long text sequences using hierarchical variants. We found that difficulty-based negative sampling strategies were not effective for the PCR task, highlighting the need for investigation into domain-specific difficulty criteria. Furthermore, we observe performance of the dense models degrade with time and calls for further research into temporal adaptation of retrieval models. Additionally, we assess the influence of different views , Halsbury's and Goodhart's, in practice in ECtHR jurisdiction using PCR task.
