Table of Contents
Fetching ...

Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction

Xi Chen, Mao Mao, Shuo Li, Haotian Shangguan

TL;DR

This work addresses Legal Judgment Prediction with limited reliance on large historical datasets by introducing Debate-Feedback, a multi-agent debate framework augmented with a reliability evaluator. The judge LM iteratively refines predictions through rounds of opposing-agent arguments, with an assistant model assessing reliability and a smoothing scheme to stabilize outcomes, yielding final decisions denoted as $y$ after $n$ rounds. Empirical results on CaseLaw and Chinese CAIl18 show the approach surpassing baselines such as GPT-4o, GPT-3.5-turbo, LegalBERT, and Lawformer, with the assistant-in-the-loop providing additional gains and robustness over standard reasoning methods like CoT and Reflexion. The findings demonstrate a viable path to robust, efficient LegalAI under data-scarce settings and point to future work integrating retrieval augmentation and broader datasets across jurisdictions.

Abstract

The use of AI in legal analysis and prediction (LegalAI) has gained widespread attention, with past research focusing on retrieval-based methods and fine-tuning large models. However, these approaches often require large datasets and underutilize the capabilities of modern large language models (LLMs). In this paper, inspired by the debate phase of real courtroom trials, we propose a novel legal judgment prediction model based on the Debate-Feedback architecture, which integrates LLM multi-agent debate and reliability evaluation models. Unlike traditional methods, our model achieves significant improvements in efficiency by minimizing the need for large historical datasets, thus offering a lightweight yet robust solution. Comparative experiments show that it outperforms several general-purpose and domain-specific legal models, offering a dynamic reasoning process and a promising direction for future LegalAI research.

Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction

TL;DR

This work addresses Legal Judgment Prediction with limited reliance on large historical datasets by introducing Debate-Feedback, a multi-agent debate framework augmented with a reliability evaluator. The judge LM iteratively refines predictions through rounds of opposing-agent arguments, with an assistant model assessing reliability and a smoothing scheme to stabilize outcomes, yielding final decisions denoted as after rounds. Empirical results on CaseLaw and Chinese CAIl18 show the approach surpassing baselines such as GPT-4o, GPT-3.5-turbo, LegalBERT, and Lawformer, with the assistant-in-the-loop providing additional gains and robustness over standard reasoning methods like CoT and Reflexion. The findings demonstrate a viable path to robust, efficient LegalAI under data-scarce settings and point to future work integrating retrieval augmentation and broader datasets across jurisdictions.

Abstract

The use of AI in legal analysis and prediction (LegalAI) has gained widespread attention, with past research focusing on retrieval-based methods and fine-tuning large models. However, these approaches often require large datasets and underutilize the capabilities of modern large language models (LLMs). In this paper, inspired by the debate phase of real courtroom trials, we propose a novel legal judgment prediction model based on the Debate-Feedback architecture, which integrates LLM multi-agent debate and reliability evaluation models. Unlike traditional methods, our model achieves significant improvements in efficiency by minimizing the need for large historical datasets, thus offering a lightweight yet robust solution. Comparative experiments show that it outperforms several general-purpose and domain-specific legal models, offering a dynamic reasoning process and a promising direction for future LegalAI research.

Paper Structure

This paper contains 13 sections, 6 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: A brief introduction of Debate-Feedback Structure
  • Figure 2: Influence of the number of debaters selected.
  • Figure 3: Influence of the number of rounds selected.