Table of Contents
Fetching ...

AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

Guhong Chen, Liyang Fan, Zihan Gong, Nan Xie, Zixuan Li, Ziqiang Liu, Chengming Li, Qiang Qu, Hamid Alinejad-Rokny, Shiwen Ni, Min Yang

TL;DR

AgentCourt introduces AdvEvol, an adversarial evolutionary framework that enables LLM-based lawyer agents to autonomously acquire legal knowledge across three specialized bases—regulations, experience, and cases—within a simulated court. By iterating through pre-trial data collection, dynamic courtroom debates, and post-trial reflection, the framework achieves strong performance on dynamic courtroom tasks, approaching GPT-4o-mini levels and outperforming specialized legal models. The CourtBench benchmark is proposed to systematically evaluate interactive legal reasoning beyond static knowledge retrieval. The work highlights the crucial role of adversarial learning in legal AI and suggests scalable directions for broader judicial and regulatory applications, with code and data being open-sourced.

Abstract

Current research in LLM-based simulation systems lacks comprehensive solutions for modeling real-world court proceedings, while existing legal language models struggle with dynamic courtroom interactions. We present AgentCourt, a comprehensive legal simulation framework that addresses these challenges through adversarial evolution of LLM-based agents. Our AgentCourt introduces a new adversarial evolutionary approach for agents called AdvEvol, which performs dynamic knowledge learning and evolution through structured adversarial interactions in a simulated courtroom program, breaking the limitations of the traditional reliance on static knowledge bases or manual annotations. By simulating 1,000 civil cases, we construct an evolving knowledge base that enhances the agents' legal reasoning abilities. The evolved lawyer agents demonstrated outstanding performance on our newly introduced CourtBench benchmark, achieving a 12.1% improvement in performance compared to the original lawyer agents. Evaluations by professional lawyers confirm the effectiveness of our approach across three critical dimensions: cognitive agility, professional knowledge, and logical rigor. Beyond outperforming specialized legal models in interactive reasoning tasks, our findings emphasize the importance of adversarial learning in legal AI and suggest promising directions for extending simulation-based legal reasoning to broader judicial and regulatory contexts. The project's code is available at: https://github.com/relic-yuexi/AgentCourt

AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

TL;DR

AgentCourt introduces AdvEvol, an adversarial evolutionary framework that enables LLM-based lawyer agents to autonomously acquire legal knowledge across three specialized bases—regulations, experience, and cases—within a simulated court. By iterating through pre-trial data collection, dynamic courtroom debates, and post-trial reflection, the framework achieves strong performance on dynamic courtroom tasks, approaching GPT-4o-mini levels and outperforming specialized legal models. The CourtBench benchmark is proposed to systematically evaluate interactive legal reasoning beyond static knowledge retrieval. The work highlights the crucial role of adversarial learning in legal AI and suggests scalable directions for broader judicial and regulatory applications, with code and data being open-sourced.

Abstract

Current research in LLM-based simulation systems lacks comprehensive solutions for modeling real-world court proceedings, while existing legal language models struggle with dynamic courtroom interactions. We present AgentCourt, a comprehensive legal simulation framework that addresses these challenges through adversarial evolution of LLM-based agents. Our AgentCourt introduces a new adversarial evolutionary approach for agents called AdvEvol, which performs dynamic knowledge learning and evolution through structured adversarial interactions in a simulated courtroom program, breaking the limitations of the traditional reliance on static knowledge bases or manual annotations. By simulating 1,000 civil cases, we construct an evolving knowledge base that enhances the agents' legal reasoning abilities. The evolved lawyer agents demonstrated outstanding performance on our newly introduced CourtBench benchmark, achieving a 12.1% improvement in performance compared to the original lawyer agents. Evaluations by professional lawyers confirm the effectiveness of our approach across three critical dimensions: cognitive agility, professional knowledge, and logical rigor. Beyond outperforming specialized legal models in interactive reasoning tasks, our findings emphasize the importance of adversarial learning in legal AI and suggest promising directions for extending simulation-based legal reasoning to broader judicial and regulatory contexts. The project's code is available at: https://github.com/relic-yuexi/AgentCourt
Paper Structure (32 sections, 14 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 14 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: (Left) The mock courtroom sandbox interface supporting character movement and real-time dialogue, with a complete case demonstration available in the supplementary materials. (Right) The automated knowledge base construction and self-evolution of lawyer agent capabilities through the mock courtroom. The red boxes highlight key components corresponding to Formula \ref{['eq1']} and Formula \ref{['eq7']} in Section \ref{['sec:AdvEvol']}, which utilize knowledge from previous cases to assist in answering questions and enable continuous learning through post-trial reflection.
  • Figure 2: Performance comparison across three dimensions—Cognitive Agility (CA), Professional Knowledge (PK), and Logical Rigor (LR). GPT-4o-mini-1000 consistently outperforms both general-purpose models (GPT-4o-mini, GPT-4o-mini+RAG) and specialized legal models (HanFei-7B, LawyerLLaMA-13B ChatLaw-33B).
  • Figure 3: Impact of training data scale on model performance. Results compare GPT-4o-mini-1000 against models trained on smaller datasets (mini-200, mini-500) across three dimensions—Cognitive Agility (CA), Professional Knowledge (PK), and Logical Rigor (LR)—highlighting the influence of training data size on model capabilities.
  • Figure 4: Ablation study results illustrating the impact of removing different knowledge bases from GPTM-1000. Performance degradation is evaluated by excluding the legal provisions database (w/o law), experience database (w/o exp), and case database (w/o case) across three dimensions: Cognitive Agility (CA), Professional Knowledge (PK), and Logical Rigor (LR).
  • Figure 5: Simulation of the court process. This figure illustrates the complete workflow of the simulated court: (1) The middle row outlines the overall court framework; (2) During the free debate phase, each agent retrieves relevant knowledge from the three databases as needed to enhance their responses; (3) Upon completing a case simulation, the agent reflects and evolves, continuously expanding its knowledge bases.
  • ...and 3 more figures