Table of Contents
Fetching ...

Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases

Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis

TL;DR

The paper tackles explainability in legal NLP by shifting rationales from word-level to paragraph-level selections in European Court of Human Rights cases. It introduces a baseline HierBERT-based model and a set of rationale regularizers—Sparsity ($L_s$), Continuity ($L_c$), Comprehensiveness variants ($L_g$), and a new Singularity constraint ($L_r$)—to guide paragraph-level rationales while predicting alleged echr article violations. A new ECtHR dataset with 11k cases, silver and gold rationales, and a gold-annotated subset is released to support this task. Empirical results show that Continuity may not help in paragraph-level settings, while carefully reformulated comprehensiveness and the novel Singularity constraint improve rationale quality and faithfulness without sacrificing classification performance; gold rationales remain challenging, indicating ample room for future research and improvements in debiasing and evaluation. The work establishes a foundation for explainable, law-focused NLP and connects rationale extraction to self-supervised summarization perspectives in long legal documents.

Abstract

Interpretability or explainability is an emerging research field in NLP. From a user-centric point of view, the goal is to build models that provide proper justification for their decisions, similar to those of humans, by requiring the models to satisfy additional constraints. To this end, we introduce a new application on legal text where, contrary to mainstream literature targeting word-level rationales, we conceive rationales as selected paragraphs in multi-paragraph structured court cases. We also release a new dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales. We use this dataset to study the effect of already proposed rationale constraints, i.e., sparsity, continuity, and comprehensiveness, formulated as regularizers. Our findings indicate that some of these constraints are not beneficial in paragraph-level rationale extraction, while others need re-formulation to better handle the multi-label nature of the task we consider. We also introduce a new constraint, singularity, which further improves the quality of rationales, even compared with noisy rationale supervision. Experimental results indicate that the newly introduced task is very challenging and there is a large scope for further research.

Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases

TL;DR

The paper tackles explainability in legal NLP by shifting rationales from word-level to paragraph-level selections in European Court of Human Rights cases. It introduces a baseline HierBERT-based model and a set of rationale regularizers—Sparsity (), Continuity (), Comprehensiveness variants (), and a new Singularity constraint ()—to guide paragraph-level rationales while predicting alleged echr article violations. A new ECtHR dataset with 11k cases, silver and gold rationales, and a gold-annotated subset is released to support this task. Empirical results show that Continuity may not help in paragraph-level settings, while carefully reformulated comprehensiveness and the novel Singularity constraint improve rationale quality and faithfulness without sacrificing classification performance; gold rationales remain challenging, indicating ample room for future research and improvements in debiasing and evaluation. The work establishes a foundation for explainable, law-focused NLP and connects rationale extraction to self-supervised summarization perspectives in long legal documents.

Abstract

Interpretability or explainability is an emerging research field in NLP. From a user-centric point of view, the goal is to build models that provide proper justification for their decisions, similar to those of humans, by requiring the models to satisfy additional constraints. To this end, we introduce a new application on legal text where, contrary to mainstream literature targeting word-level rationales, we conceive rationales as selected paragraphs in multi-paragraph structured court cases. We also release a new dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales. We use this dataset to study the effect of already proposed rationale constraints, i.e., sparsity, continuity, and comprehensiveness, formulated as regularizers. Our findings indicate that some of these constraints are not beneficial in paragraph-level rationale extraction, while others need re-formulation to better handle the multi-label nature of the task we consider. We also introduce a new constraint, singularity, which further improves the quality of rationales, even compared with noisy rationale supervision. Experimental results indicate that the newly introduced task is very challenging and there is a large scope for further research.

Paper Structure

This paper contains 22 sections, 9 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: A depiction of the ecthr process: The applicant(s) request a hearing from ecthr regarding specific accusations (alleged violations of echr articles) against the defendant state(s), based on facts. The Court (judges) assesses the facts and the rest of the parties' submissions, and rules on the violation or not of the allegedly violated echr articles. Here, prominent facts referred in the court's assessment are highlighted.
  • Figure 2: Illustration of hierbert-ha. The shaded parts operate only when $Lg$ or $Lr$ are used.
  • Figure 3: (KNEZEVIC v. CROATIA, No. 55133/13}) The model extracted most of the relevant facts indicating a possible violation of Article 5. Note that 67% (10 of 15) of the facts were considered relevant by the legal expert. Our model has a disadvantage in this case because, being trained to operate at a predefined sparsity level (30%), it extracted only 5 of the 15 facts (33%).
  • Figure 4: (K.I. v. RUSSIA, No. 58182/14) Paragraphs 9, 11, 13 and 20 clearly indicate plausible violation of the right to liberty (Article 5), as they refer to continuous extension of applicant detention, but our model was unable to extract them, thus it was unable to predict this allegation. The model targeted only paragraphs that indicate ill-treatment, which is connected to plausible violation of Article 3 (Prohibition of Torture).
  • Figure 5: (KAIMOVA AND OTHERS v. RUSSIA, No. 58182/14) Paragraphs 16 and 19 clearly indicate that the applicant's health (life) was at risk and authorities did not pay attention, but these paragraphs were not selected by the model. Instead paragraph 10 states that the applicant initially informed the authorities for his medical history and they provided medication. This is an indication of model sensitivity to language describing health issues (tuberculosis) in general and not specific well-defined allegations for ill-treatment on the merits.
  • ...and 2 more figures