Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents
Wei Fan, Tianshi Zheng, Yiran Hu, Zheye Deng, Weiqi Wang, Baixuan Xu, Chunyang Li, Haoran Li, Weixing Shen, Yangqiu Song
TL;DR
The paper proposes Legal Rule Induction (LRI) to automatically extract concise, generalizable doctrinal rules from sets of analogous judicial precedents. It defines a formal task, builds the first large LRI benchmark with LRI-AUTO and LRI-GOLD, and demonstrates that while top LLMs initially struggle with hallucination and over-generalization, training on the LRI dataset substantially improves rule discovery, especially for smaller models via LoRA. It evaluates multiple inductive pipelines (Direct, CoT, Long-CoT, SILVER) and shows SILVER consistently improves recall and macro/micro F1 scores, with LRI-AUTO enhancing small-model performance by enabling effective fine-tuning. The work advances computational jurisprudence by enabling inductive rule discovery, provides a rigorous dataset and evaluation framework, and highlights practical considerations for deploying LRI in Chinese legal contexts and beyond.
Abstract
Legal rules encompass not only codified statutes but also implicit adjudicatory principles derived from precedents that contain discretionary norms, social morality, and policy. While computational legal research has advanced in applying established rules to cases, inducing legal rules from judicial decisions remains understudied, constrained by limitations in model inference efficacy and symbolic reasoning capability. The advent of Large Language Models (LLMs) offers unprecedented opportunities for automating the extraction of such latent principles, yet progress is stymied by the absence of formal task definitions, benchmark datasets, and methodologies. To address this gap, we formalize Legal Rule Induction (LRI) as the task of deriving concise, generalizable doctrinal rules from sets of analogous precedents, distilling their shared preconditions, normative behaviors, and legal consequences. We introduce the first LRI benchmark, comprising 5,121 case sets (38,088 Chinese cases in total) for model tuning and 216 expert-annotated gold test sets. Experimental results reveal that: 1) State-of-the-art LLMs struggle with over-generalization and hallucination; 2) Training on our dataset markedly enhances LLMs capabilities in capturing nuanced rule patterns across similar cases.
