Table of Contents
Fetching ...

AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios

Yuting Huang, Meitong Guo, Yiquan Wu, Ang Li, Xiaozhong Liu, Keting Yin, Changlong Sun, Fei Wu, Kun Kuang

TL;DR

This paper addresses the neglected appellate stage of civil litigation by introducing AppealCase, a dataset of 10,000 paired first- and second-instance judgments across 91 causes, with five annotation dimensions and five new LegalAI tasks. It provides a detailed annotation pipeline, high expert-validated quality, and an extensive evaluation of 20 models across five tasks, revealing significant gaps—most notably sub-50% F1 on judgment reversal prediction. The work demonstrates the challenges of modeling appellate reasoning, long Chinese judicial documents, and nuanced legal provisions, while delivering a publicly available resource under CC BY-NC 4.0 to spur research in appellate case analysis and judicial consistency. Overall, AppealCase lays a foundation for scalable appellate analysis, offering benchmarks and insights that can drive the development of models capable of supporting more fair and consistent judicial outcomes.

Abstract

Recent advances in LegalAI have primarily focused on individual case judgment analysis, often overlooking the critical appellate process within the judicial system. Appeals serve as a core mechanism for error correction and ensuring fair trials, making them highly significant both in practice and in research. To address this gap, we present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases. The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance. Based on these annotations, we propose five novel LegalAI tasks and conduct a comprehensive evaluation across 20 mainstream models. Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario. We hope that the AppealCase dataset will spur further research in LegalAI for appellate case analysis and contribute to improving consistency in judicial decision-making.

AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios

TL;DR

This paper addresses the neglected appellate stage of civil litigation by introducing AppealCase, a dataset of 10,000 paired first- and second-instance judgments across 91 causes, with five annotation dimensions and five new LegalAI tasks. It provides a detailed annotation pipeline, high expert-validated quality, and an extensive evaluation of 20 models across five tasks, revealing significant gaps—most notably sub-50% F1 on judgment reversal prediction. The work demonstrates the challenges of modeling appellate reasoning, long Chinese judicial documents, and nuanced legal provisions, while delivering a publicly available resource under CC BY-NC 4.0 to spur research in appellate case analysis and judicial consistency. Overall, AppealCase lays a foundation for scalable appellate analysis, offering benchmarks and insights that can drive the development of models capable of supporting more fair and consistent judicial outcomes.

Abstract

Recent advances in LegalAI have primarily focused on individual case judgment analysis, often overlooking the critical appellate process within the judicial system. Appeals serve as a core mechanism for error correction and ensuring fair trials, making them highly significant both in practice and in research. To address this gap, we present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases. The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance. Based on these annotations, we propose five novel LegalAI tasks and conduct a comprehensive evaluation across 20 mainstream models. Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario. We hope that the AppealCase dataset will spur further research in LegalAI for appellate case analysis and contribute to improving consistency in judicial decision-making.

Paper Structure

This paper contains 45 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Procedural flow from first-instance trial to second-instance judgment, illustrating the roles of appellants, new evidence submission, and possible outcomes (affirmation or reversal) of the original judgment.
  • Figure 2: A comparative example of first-instance and second-instance documents. The diagram illustrates the structural correspondence between sections such as claims, facts, court’s view, and judgment. Underlined phrases represent typical legal expressions used to segment each paragraph in real-world documents.
  • Figure 3: Distribution of reversal reasons in the AppealCase dataset.
  • Figure 4: Performance of different models on the five tasks.