Table of Contents
Fetching ...

Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction

Chenlong Deng, Kelong Mao, Yuyao Zhang, Zhicheng Dou

TL;DR

This paper tackles legal judgment prediction by identifying discriminating challenges among confusing charges and proposing the ADAPT reasoning framework to emulate human judicial reasoning. ADAPT decomposes cases into assessable steps—Ask, Discriminate, and Predict—to improve charge and law-article discrimination and final judgments. The authors enhance this approach with synthetic trajectory generation from a large model and multi-task instruction tuning to train a smaller model for robust, efficient reasoning under ADAPT. Comprehensive experiments on CAIL2018 and MultiLJP show state-of-the-art performance, particularly for difficult charges, with careful ablations and analyses of domain-specific limitations. The work advances practical LJP with discriminative reasoning and offers guidance on data costs, dataset diversity, and ethical considerations for deployment.

Abstract

Legal judgment prediction is essential for enhancing judicial efficiency. In this work, we identify that existing large language models (LLMs) underperform in this domain due to challenges in understanding case complexities and distinguishing between similar charges. To adapt LLMs for effective legal judgment prediction, we introduce the Ask-Discriminate-Predict (ADAPT) reasoning framework inspired by human judicial reasoning. ADAPT involves decomposing case facts, discriminating among potential charges, and predicting the final judgment. We further enhance LLMs through fine-tuning with multi-task synthetic trajectories to improve legal judgment prediction accuracy and efficiency under our ADAPT framework. Extensive experiments conducted on two widely-used datasets demonstrate the superior performance of our framework in legal judgment prediction, particularly when dealing with complex and confusing charges.

Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction

TL;DR

This paper tackles legal judgment prediction by identifying discriminating challenges among confusing charges and proposing the ADAPT reasoning framework to emulate human judicial reasoning. ADAPT decomposes cases into assessable steps—Ask, Discriminate, and Predict—to improve charge and law-article discrimination and final judgments. The authors enhance this approach with synthetic trajectory generation from a large model and multi-task instruction tuning to train a smaller model for robust, efficient reasoning under ADAPT. Comprehensive experiments on CAIL2018 and MultiLJP show state-of-the-art performance, particularly for difficult charges, with careful ablations and analyses of domain-specific limitations. The work advances practical LJP with discriminative reasoning and offers guidance on data costs, dataset diversity, and ethical considerations for deployment.

Abstract

Legal judgment prediction is essential for enhancing judicial efficiency. In this work, we identify that existing large language models (LLMs) underperform in this domain due to challenges in understanding case complexities and distinguishing between similar charges. To adapt LLMs for effective legal judgment prediction, we introduce the Ask-Discriminate-Predict (ADAPT) reasoning framework inspired by human judicial reasoning. ADAPT involves decomposing case facts, discriminating among potential charges, and predicting the final judgment. We further enhance LLMs through fine-tuning with multi-task synthetic trajectories to improve legal judgment prediction accuracy and efficiency under our ADAPT framework. Extensive experiments conducted on two widely-used datasets demonstrate the superior performance of our framework in legal judgment prediction, particularly when dealing with complex and confusing charges.
Paper Structure (34 sections, 1 equation, 9 figures, 4 tables)

This paper contains 34 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Comparison of our framework with direct reasoning and legal syllogism. We notice that our approach improves the performance on confusing charges more obviously after fine-tuning.
  • Figure 2: Overview of our framework. The final judgment is predicted based on three different reasoning steps.
  • Figure 3: Fine-tuning performance of each sub-task with epochs on the CAIL2018 dataset.
  • Figure 4: Performance in different charge subgroups of various methods. Intervals of larger numbers (e.g., 75%-100%) indicate greater difficulty of the corresponding subgroup.
  • Figure 5: Prompt for each task of our multi-task instruction tuning.
  • ...and 4 more figures