AppealCase: A Dataset and Benchmark for Civil Case Appeal Scenarios
Yuting Huang, Meitong Guo, Yiquan Wu, Ang Li, Xiaozhong Liu, Keting Yin, Changlong Sun, Fei Wu, Kun Kuang
TL;DR
This paper addresses the neglected appellate stage of civil litigation by introducing AppealCase, a dataset of 10,000 paired first- and second-instance judgments across 91 causes, with five annotation dimensions and five new LegalAI tasks. It provides a detailed annotation pipeline, high expert-validated quality, and an extensive evaluation of 20 models across five tasks, revealing significant gaps—most notably sub-50% F1 on judgment reversal prediction. The work demonstrates the challenges of modeling appellate reasoning, long Chinese judicial documents, and nuanced legal provisions, while delivering a publicly available resource under CC BY-NC 4.0 to spur research in appellate case analysis and judicial consistency. Overall, AppealCase lays a foundation for scalable appellate analysis, offering benchmarks and insights that can drive the development of models capable of supporting more fair and consistent judicial outcomes.
Abstract
Recent advances in LegalAI have primarily focused on individual case judgment analysis, often overlooking the critical appellate process within the judicial system. Appeals serve as a core mechanism for error correction and ensuring fair trials, making them highly significant both in practice and in research. To address this gap, we present the AppealCase dataset, consisting of 10,000 pairs of real-world, matched first-instance and second-instance documents across 91 categories of civil cases. The dataset also includes detailed annotations along five dimensions central to appellate review: judgment reversals, reversal reasons, cited legal provisions, claim-level decisions, and whether there is new information in the second instance. Based on these annotations, we propose five novel LegalAI tasks and conduct a comprehensive evaluation across 20 mainstream models. Experimental results reveal that all current models achieve less than 50% F1 scores on the judgment reversal prediction task, highlighting the complexity and challenge of the appeal scenario. We hope that the AppealCase dataset will spur further research in LegalAI for appellate case analysis and contribute to improving consistency in judicial decision-making.
