Table of Contents
Fetching ...

LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation

Weikang Yuan, Kaisong Song, Zhuoren Jiang, Junjie Cao, Yujie Zhang, Jun Lin, Kun Kuang, Ji Zhang, Xiaozhong Liu

TL;DR

LeCoDe introduces the first real-world, multi-turn legal consultation dataset collected from authentic live consultations on Chinese short-video platforms, with extensive expert annotations and a two-part evaluation framework that measures clarification capability and advice quality. The authors provide a detailed dataset construction pipeline, rich annotation schema (including atomic key facts and their importance), and three supervised fine-tuning strategies to improve LLMs’ performance in interactive legal dialogues. Experimental results reveal substantial gaps in current models’ abilities to elicit critical facts and provide high-quality, professional legal advice, while demonstrating that targeted SFT approaches, especially Key-fact-e SFT, can yield notable gains and even surpass some state-of-the-art models in certain metrics. The work also discusses ethical considerations, licensing, and future directions such as external knowledge integration to bridge remaining gaps, aiming to make professional legal consultation more accessible and reliable through AI systems.

Abstract

Legal consultation is essential for safeguarding individual rights and ensuring access to justice, yet remains costly and inaccessible to many individuals due to the shortage of professionals. While recent advances in Large Language Models (LLMs) offer a promising path toward scalable, low-cost legal assistance, current systems fall short in handling the interactive and knowledge-intensive nature of real-world consultations. To address these challenges, we introduce LeCoDe, a real-world multi-turn benchmark dataset comprising 3,696 legal consultation dialogues with 110,008 dialogue turns, designed to evaluate and improve LLMs' legal consultation capability. With LeCoDe, we innovatively collect live-streamed consultations from short-video platforms, providing authentic multi-turn legal consultation dialogues. The rigorous annotation by legal experts further enhances the dataset with professional insights and expertise. Furthermore, we propose a comprehensive evaluation framework that assesses LLMs' consultation capabilities in terms of (1) clarification capability and (2) professional advice quality. This unified framework incorporates 12 metrics across two dimensions. Through extensive experiments on various general and domain-specific LLMs, our results reveal significant challenges in this task, with even state-of-the-art models like GPT-4 achieving only 39.8% recall for clarification and 59% overall score for advice quality, highlighting the complexity of professional consultation scenarios. Based on these findings, we further explore several strategies to enhance LLMs' legal consultation abilities. Our benchmark contributes to advancing research in legal domain dialogue systems, particularly in simulating more real-world user-expert interactions.

LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation

TL;DR

LeCoDe introduces the first real-world, multi-turn legal consultation dataset collected from authentic live consultations on Chinese short-video platforms, with extensive expert annotations and a two-part evaluation framework that measures clarification capability and advice quality. The authors provide a detailed dataset construction pipeline, rich annotation schema (including atomic key facts and their importance), and three supervised fine-tuning strategies to improve LLMs’ performance in interactive legal dialogues. Experimental results reveal substantial gaps in current models’ abilities to elicit critical facts and provide high-quality, professional legal advice, while demonstrating that targeted SFT approaches, especially Key-fact-e SFT, can yield notable gains and even surpass some state-of-the-art models in certain metrics. The work also discusses ethical considerations, licensing, and future directions such as external knowledge integration to bridge remaining gaps, aiming to make professional legal consultation more accessible and reliable through AI systems.

Abstract

Legal consultation is essential for safeguarding individual rights and ensuring access to justice, yet remains costly and inaccessible to many individuals due to the shortage of professionals. While recent advances in Large Language Models (LLMs) offer a promising path toward scalable, low-cost legal assistance, current systems fall short in handling the interactive and knowledge-intensive nature of real-world consultations. To address these challenges, we introduce LeCoDe, a real-world multi-turn benchmark dataset comprising 3,696 legal consultation dialogues with 110,008 dialogue turns, designed to evaluate and improve LLMs' legal consultation capability. With LeCoDe, we innovatively collect live-streamed consultations from short-video platforms, providing authentic multi-turn legal consultation dialogues. The rigorous annotation by legal experts further enhances the dataset with professional insights and expertise. Furthermore, we propose a comprehensive evaluation framework that assesses LLMs' consultation capabilities in terms of (1) clarification capability and (2) professional advice quality. This unified framework incorporates 12 metrics across two dimensions. Through extensive experiments on various general and domain-specific LLMs, our results reveal significant challenges in this task, with even state-of-the-art models like GPT-4 achieving only 39.8% recall for clarification and 59% overall score for advice quality, highlighting the complexity of professional consultation scenarios. Based on these findings, we further explore several strategies to enhance LLMs' legal consultation abilities. Our benchmark contributes to advancing research in legal domain dialogue systems, particularly in simulating more real-world user-expert interactions.

Paper Structure

This paper contains 49 sections, 7 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: An illustration of real-world legal consultation dialogue.
  • Figure 2: Data Construction Pipeline for LeCoDe.
  • Figure 3: Dataset Distribution.
  • Figure 4: Training Dialogue Construction Strategy for SFT
  • Figure 5: Impact of Case Complexity on Model Performance Metrics: Analysis of Weighted Recall, Ask Turn, BERTScore, and Overall Score (OA) across Different Numbers of Atomic Key Facts.
  • ...and 2 more figures