Table of Contents
Fetching ...

Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents

Wei Fan, Tianshi Zheng, Yiran Hu, Zheye Deng, Weiqi Wang, Baixuan Xu, Chunyang Li, Haoran Li, Weixing Shen, Yangqiu Song

TL;DR

The paper proposes Legal Rule Induction (LRI) to automatically extract concise, generalizable doctrinal rules from sets of analogous judicial precedents. It defines a formal task, builds the first large LRI benchmark with LRI-AUTO and LRI-GOLD, and demonstrates that while top LLMs initially struggle with hallucination and over-generalization, training on the LRI dataset substantially improves rule discovery, especially for smaller models via LoRA. It evaluates multiple inductive pipelines (Direct, CoT, Long-CoT, SILVER) and shows SILVER consistently improves recall and macro/micro F1 scores, with LRI-AUTO enhancing small-model performance by enabling effective fine-tuning. The work advances computational jurisprudence by enabling inductive rule discovery, provides a rigorous dataset and evaluation framework, and highlights practical considerations for deploying LRI in Chinese legal contexts and beyond.

Abstract

Legal rules encompass not only codified statutes but also implicit adjudicatory principles derived from precedents that contain discretionary norms, social morality, and policy. While computational legal research has advanced in applying established rules to cases, inducing legal rules from judicial decisions remains understudied, constrained by limitations in model inference efficacy and symbolic reasoning capability. The advent of Large Language Models (LLMs) offers unprecedented opportunities for automating the extraction of such latent principles, yet progress is stymied by the absence of formal task definitions, benchmark datasets, and methodologies. To address this gap, we formalize Legal Rule Induction (LRI) as the task of deriving concise, generalizable doctrinal rules from sets of analogous precedents, distilling their shared preconditions, normative behaviors, and legal consequences. We introduce the first LRI benchmark, comprising 5,121 case sets (38,088 Chinese cases in total) for model tuning and 216 expert-annotated gold test sets. Experimental results reveal that: 1) State-of-the-art LLMs struggle with over-generalization and hallucination; 2) Training on our dataset markedly enhances LLMs capabilities in capturing nuanced rule patterns across similar cases.

Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents

TL;DR

The paper proposes Legal Rule Induction (LRI) to automatically extract concise, generalizable doctrinal rules from sets of analogous judicial precedents. It defines a formal task, builds the first large LRI benchmark with LRI-AUTO and LRI-GOLD, and demonstrates that while top LLMs initially struggle with hallucination and over-generalization, training on the LRI dataset substantially improves rule discovery, especially for smaller models via LoRA. It evaluates multiple inductive pipelines (Direct, CoT, Long-CoT, SILVER) and shows SILVER consistently improves recall and macro/micro F1 scores, with LRI-AUTO enhancing small-model performance by enabling effective fine-tuning. The work advances computational jurisprudence by enabling inductive rule discovery, provides a rigorous dataset and evaluation framework, and highlights practical considerations for deploying LRI in Chinese legal contexts and beyond.

Abstract

Legal rules encompass not only codified statutes but also implicit adjudicatory principles derived from precedents that contain discretionary norms, social morality, and policy. While computational legal research has advanced in applying established rules to cases, inducing legal rules from judicial decisions remains understudied, constrained by limitations in model inference efficacy and symbolic reasoning capability. The advent of Large Language Models (LLMs) offers unprecedented opportunities for automating the extraction of such latent principles, yet progress is stymied by the absence of formal task definitions, benchmark datasets, and methodologies. To address this gap, we formalize Legal Rule Induction (LRI) as the task of deriving concise, generalizable doctrinal rules from sets of analogous precedents, distilling their shared preconditions, normative behaviors, and legal consequences. We introduce the first LRI benchmark, comprising 5,121 case sets (38,088 Chinese cases in total) for model tuning and 216 expert-annotated gold test sets. Experimental results reveal that: 1) State-of-the-art LLMs struggle with over-generalization and hallucination; 2) Training on our dataset markedly enhances LLMs capabilities in capturing nuanced rule patterns across similar cases.

Paper Structure

This paper contains 53 sections, 4 equations, 15 figures, 15 tables, 1 algorithm.

Figures (15)

  • Figure 1: An illustration of legal rule induction from analogous judicial cases via the three-element logical structure of legal rules zhang2018jurisprudence.
  • Figure 2: The overview of the LRI-AUTO dataset curation pipeline (for civil and criminal cases) and main methods for rule induction, including LoRA, which utilizes LRI-AUTO for tuning and the LRI-GOLD dataset for testing.
  • Figure 3: Distribution of rule set sizes across case numbers in the LRI Dataset.
  • Figure 4: Scores (%) of different baselines. For the Direct, CoT, and SILVER baselines, only the five LLMs common to all three are considered.
  • Figure 5: Performance trends of Direct Induction of ten LLMs across varying case set sizes.
  • ...and 10 more figures