Table of Contents
Fetching ...

Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction

Shengbin Yue, Ting Huang, Zheng Jia, Siyuan Wang, Shujun Liu, Yun Song, Xuanjing Huang, Zhongyu Wei

TL;DR

This work tackles the scarcity of scalable, interactive legal scenario data by introducing MASER, a Multi-agent Legal Simulation Driver that coordinates role-preserving data generation across Client, Lawyer, and Supervisor agents. It combines real-world legal sources with Big-5 personality-based role presets and a sentence-level supervision mechanism to produce authentic, distractor-aware social simulations, culminating in SynthLaw, a large synthetic dataset for fine-tuning LLMs. The accompanying MILE benchmark evaluates LLM-driven lawyers in dynamic, goal-oriented tasks, using a two-phase assessment of interaction quality and final complaint quality, derived from real judgments. Experimental results show SynthLaw markedly improves interactive and goal-oriented performance over baselines, bridging the gap between intensive interaction and legal task achievement, with strong robustness across different client profiles and base models. The framework promises scalable, domain-specific data generation for advanced legal AI systems and can be extended to more complex proceedings and consultative contexts.

Abstract

Large Language Models (LLMs) have significantly advanced legal intelligence, but the scarcity of scenario data impedes the progress toward interactive legal scenarios. This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios. Leveraging real-legal case sources, MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants' characters and behaviors as well as addressing distractions. A Multi-stage Interactive Legal Evaluation (MILE) benchmark is further constructed to evaluate LLMs' performance in dynamic legal scenarios. Extensive experiments confirm the effectiveness of our framework.

Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction

TL;DR

This work tackles the scarcity of scalable, interactive legal scenario data by introducing MASER, a Multi-agent Legal Simulation Driver that coordinates role-preserving data generation across Client, Lawyer, and Supervisor agents. It combines real-world legal sources with Big-5 personality-based role presets and a sentence-level supervision mechanism to produce authentic, distractor-aware social simulations, culminating in SynthLaw, a large synthetic dataset for fine-tuning LLMs. The accompanying MILE benchmark evaluates LLM-driven lawyers in dynamic, goal-oriented tasks, using a two-phase assessment of interaction quality and final complaint quality, derived from real judgments. Experimental results show SynthLaw markedly improves interactive and goal-oriented performance over baselines, bridging the gap between intensive interaction and legal task achievement, with strong robustness across different client profiles and base models. The framework promises scalable, domain-specific data generation for advanced legal AI systems and can be extended to more complex proceedings and consultative contexts.

Abstract

Large Language Models (LLMs) have significantly advanced legal intelligence, but the scarcity of scenario data impedes the progress toward interactive legal scenarios. This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios. Leveraging real-legal case sources, MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants' characters and behaviors as well as addressing distractions. A Multi-stage Interactive Legal Evaluation (MILE) benchmark is further constructed to evaluate LLMs' performance in dynamic legal scenarios. Extensive experiments confirm the effectiveness of our framework.

Paper Structure

This paper contains 54 sections, 1 equation, 25 figures, 9 tables, 1 algorithm.

Figures (25)

  • Figure 1: Examples of general LLM (i.e., GPT-4o) and legal LLM (i.e., LawLLM) as legal professionals in drafting legal documents. LLMs struggle to maintain flexible interaction patterns under legal agendas.
  • Figure 2: Overview of Multi-agent Legal Simulation Driver (MASER), which consists of role agent presetting and multi-agent legal simulation. Leveraging the MASER, synthesized sentence-level data can drive arbitrary LLMs for legal intensive interaction.
  • Figure 3: Distribution of legal attributes for our MILE benchmark, including 9 primary attributes.
  • Figure 4: Comparative results of total performances, where G-AVE and I-AVE stand for goal evaluation and interaction evaluation average scores respectively.
  • Figure 5: The scores (Interactivity and Logicality) over different turn numbers on interaction evaluation, where the baseline is Qwen2.5-instruct-7B.
  • ...and 20 more figures