Chain of Logic: Rule-Based Reasoning with Large Language Models
Sergio Servantez, Joe Barrow, Kristian Hammond, Rajiv Jain
TL;DR
The paper tackles the challenge of rule-based, compositional legal reasoning by evaluating large language models (LMs) and introducing Chain of Logic, a prompting framework that decomposes rules into individual elements, reasons about them separately, and recombines the results to resolve complex logical expressions. Built on IRAC-inspired principles, Chain of Logic yields interpretable, stepwise reasoning traces and supports debugging of incorrect conclusions. Across eight LegalBench tasks and multiple language models (including GPT-3.5/4 and open-source variants), this method consistently outperforms chain-of-thought, self-ask, and other baselines in a single-demo, different-rule setting, reducing the need for extensive per-rule demonstrations. The approach holds promise for legal-domain AI applications, potentially enabling better reasoning, easier instruction tuning, and reduced reliance on large annotated datasets, with future extensions to multi-pass and retrieval-augmented strategies.
Abstract
Rule-based reasoning, a fundamental type of legal reasoning, enables us to draw conclusions by accurately applying a rule to a set of facts. We explore causal language models as rule-based reasoners, specifically with respect to compositional rules - rules consisting of multiple elements which form a complex logical expression. Reasoning about compositional rules is challenging because it requires multiple reasoning steps, and attending to the logical relationships between elements. We introduce a new prompting method, Chain of Logic, which elicits rule-based reasoning through decomposition (solving elements as independent threads of logic), and recomposition (recombining these sub-answers to resolve the underlying logical expression). This method was inspired by the IRAC (Issue, Rule, Application, Conclusion) framework, a sequential reasoning approach used by lawyers. We evaluate chain of logic across eight rule-based reasoning tasks involving three distinct compositional rules from the LegalBench benchmark and demonstrate it consistently outperforms other prompting methods, including chain of thought and self-ask, using open-source and commercial language models.
