Table of Contents
Fetching ...

Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning

Kepu Zhang, Haoyue Yang, Xu Tang, Weijie Yu, Jun Xu

TL;DR

The paper tackles the gap in legal judgment prediction by introducing LJPIV, a benchmark that incorporates innocent verdicts and enables trichotomous reasoning across elements of the offense, unlawfulness, and culpability. It develops a three-stage data augmentation pipeline—sentence extraction, grounds-for-justification injection via retrieval-augmented generation, and rigorous quality verification—to produce not-guilty labeled data across three datasets, followed by two avenues for trichotomous reasoning: a prompt-based method and a LoRA-finetuning approach. Experiments show current legal LLMs struggle with innocence, while the proposed strategies improve both in-domain and cross-domain performance, especially for innocent verdicts. The work provides a practical dataset and methodological framework to advance legally-informed AI for civil-law contexts and highlights the need for explicit innocence reasoning in deployed legal NLP systems.

Abstract

In legal practice, judges apply the trichotomous dogmatics of criminal law, sequentially assessing the elements of the offense, unlawfulness, and culpability to determine whether an individual's conduct constitutes a crime. Although current legal large language models (LLMs) show promising accuracy in judgment prediction, they lack trichotomous reasoning capabilities due to the absence of an appropriate benchmark dataset, preventing them from predicting innocent outcomes. As a result, every input is automatically assigned a charge, limiting their practical utility in legal contexts. To bridge this gap, we introduce LJPIV, the first benchmark dataset for Legal Judgment Prediction with Innocent Verdicts. Adhering to the trichotomous dogmatics, we extend three widely-used legal datasets through LLM-based augmentation and manual verification. Our experiments with state-of-the-art legal LLMs and novel strategies that integrate trichotomous reasoning into zero-shot prompting and fine-tuning reveal: (1) current legal LLMs have significant room for improvement, with even the best models achieving an F1 score of less than 0.3 on LJPIV; and (2) our strategies notably enhance both in-domain and cross-domain judgment prediction accuracy, especially for cases resulting in an innocent verdict.

Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning

TL;DR

The paper tackles the gap in legal judgment prediction by introducing LJPIV, a benchmark that incorporates innocent verdicts and enables trichotomous reasoning across elements of the offense, unlawfulness, and culpability. It develops a three-stage data augmentation pipeline—sentence extraction, grounds-for-justification injection via retrieval-augmented generation, and rigorous quality verification—to produce not-guilty labeled data across three datasets, followed by two avenues for trichotomous reasoning: a prompt-based method and a LoRA-finetuning approach. Experiments show current legal LLMs struggle with innocence, while the proposed strategies improve both in-domain and cross-domain performance, especially for innocent verdicts. The work provides a practical dataset and methodological framework to advance legally-informed AI for civil-law contexts and highlights the need for explicit innocence reasoning in deployed legal NLP systems.

Abstract

In legal practice, judges apply the trichotomous dogmatics of criminal law, sequentially assessing the elements of the offense, unlawfulness, and culpability to determine whether an individual's conduct constitutes a crime. Although current legal large language models (LLMs) show promising accuracy in judgment prediction, they lack trichotomous reasoning capabilities due to the absence of an appropriate benchmark dataset, preventing them from predicting innocent outcomes. As a result, every input is automatically assigned a charge, limiting their practical utility in legal contexts. To bridge this gap, we introduce LJPIV, the first benchmark dataset for Legal Judgment Prediction with Innocent Verdicts. Adhering to the trichotomous dogmatics, we extend three widely-used legal datasets through LLM-based augmentation and manual verification. Our experiments with state-of-the-art legal LLMs and novel strategies that integrate trichotomous reasoning into zero-shot prompting and fine-tuning reveal: (1) current legal LLMs have significant room for improvement, with even the best models achieving an F1 score of less than 0.3 on LJPIV; and (2) our strategies notably enhance both in-domain and cross-domain judgment prediction accuracy, especially for cases resulting in an innocent verdict.

Paper Structure

This paper contains 23 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An illustration of trichotomous dogmatics of criminal law. The reasoning process proceeds sequentially from top to bottom. Conduct that satisfies the elements of the offense, unlawfulness, and culpability is deemed to constitute a crime; otherwise, the individual is considered innocent.
  • Figure 2: DISC-Law yue2023disc incorrectly predicts charges for non-guilty fact descriptions across the elements of the offense, unlawfulness, and culpability. The red parts represent actions that may lead to a guilty verdict, while the blue parts indicate acts or situations that result in contradictions or exoneration.
  • Figure 3: The prompt for trichotomous reasoning used in this study.
  • Figure 4: Prediction accuracy for different case types on the LJPIV-CAIL test set. "All" indicates the overall accuracy, while "Type$_0$" represents the accuracy for guilty cases. "Type$_1$", "Type$_2$", and "Type$_3$" represent the accuracies for non-guilty cases due to lack of elements, unlawfulness, and culpability, respectively.