Table of Contents
Fetching ...

Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law

Manuj Kant, Sareh Nabi, Manav Kant, Roland Scharrer, Megan Ma, Marzieh Nabi

TL;DR

This work addresses the need for trustworthy legal AI by combining LLMs with logic-based reasoning to produce auditable, explainable analyses of contract coverage. It demonstrates a neuro-symbolic approach where LLMs translate legal terms into logic encodings and a Prolog-like engine performs deduction, yielding higher accuracy and consistency than vanilla LLMs. In a health-insurance case study, vanilla models reach around $0.78$–$0.88$ accuracy, while expert-guided Prolog encodings achieve $100 ext{ extpercent}$ accuracy on a simplified policy and up to $95 ext{ extpercent}$ (ART) and $87 ext{ extpercent}$ (CI) on more complex contracts, underscoring the value of structured reasoning and domain guidance. The paper discusses future directions, including fine-tuning with logic data, agentic AI, and RL-based improvements, to broaden applicability and strengthen explainability in legal AI systems.

Abstract

Legal services rely heavily on text processing. While large language models (LLMs) show promise, their application in legal contexts demands higher accuracy, repeatability, and transparency. Logic programs, by encoding legal concepts as structured rules and facts, offer reliable automation, but require sophisticated text extraction. We propose a neuro-symbolic approach that integrates LLMs' natural language understanding with logic-based reasoning to address these limitations. As a legal document case study, we applied neuro-symbolic AI to coverage-related queries in insurance contracts using both closed and open-source LLMs. While LLMs have improved in legal reasoning, they still lack the accuracy and consistency required for complex contract analysis. In our analysis, we tested three methodologies to evaluate whether a specific claim is covered under a contract: a vanilla LLM, an unguided approach that leverages LLMs to encode both the contract and the claim, and a guided approach that uses a framework for the LLM to encode the contract. We demonstrated the promising capabilities of LLM + Logic in the guided approach.

Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law

TL;DR

This work addresses the need for trustworthy legal AI by combining LLMs with logic-based reasoning to produce auditable, explainable analyses of contract coverage. It demonstrates a neuro-symbolic approach where LLMs translate legal terms into logic encodings and a Prolog-like engine performs deduction, yielding higher accuracy and consistency than vanilla LLMs. In a health-insurance case study, vanilla models reach around accuracy, while expert-guided Prolog encodings achieve accuracy on a simplified policy and up to (ART) and (CI) on more complex contracts, underscoring the value of structured reasoning and domain guidance. The paper discusses future directions, including fine-tuning with logic data, agentic AI, and RL-based improvements, to broaden applicability and strengthen explainability in legal AI systems.

Abstract

Legal services rely heavily on text processing. While large language models (LLMs) show promise, their application in legal contexts demands higher accuracy, repeatability, and transparency. Logic programs, by encoding legal concepts as structured rules and facts, offer reliable automation, but require sophisticated text extraction. We propose a neuro-symbolic approach that integrates LLMs' natural language understanding with logic-based reasoning to address these limitations. As a legal document case study, we applied neuro-symbolic AI to coverage-related queries in insurance contracts using both closed and open-source LLMs. While LLMs have improved in legal reasoning, they still lack the accuracy and consistency required for complex contract analysis. In our analysis, we tested three methodologies to evaluate whether a specific claim is covered under a contract: a vanilla LLM, an unguided approach that leverages LLMs to encode both the contract and the claim, and a guided approach that uses a framework for the LLM to encode the contract. We demonstrated the promising capabilities of LLM + Logic in the guided approach.

Paper Structure

This paper contains 25 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: LLM models' average accuracy on the Chubb insurance claim coverage dataset. The plot (top) visualizes the models' average accuracy with error bars representing the Standard Error of the Mean (SEM) across 10 trials. The tables (bottom) provide the corresponding raw numerical accuracy values: the left table represents the Vanilla LLM approach, while the right table corresponds to the Unguided Prolog Generation approach.
  • Figure 2: Experimental overview: (a) Functionality of the CodeX Insurance Analyst coverage rules. (b) The LLM is prompted to generate its own version of the coverage rule given the text of the coverage and documentation of the valid claims and helper rules it can call. (c) The LLM's generated coverage rule is tested by passing it test claims and determining if the correct coverage decisions were made.
  • Figure 3: Average accuracy for the simplified Chubb contract across three approaches—Vanilla LLM, Unguided, and Guided—and three models, GPT-4o, o1-preview, and DeepSeek-R1, with error bars representing the standard error of the mean across 10 trials.
  • Figure 4: LLMs' average accuracy for Chubb, ART, and CI coverages using Guided approach. Error bars represent the standard error of the mean across 10 trials. Models used are Deepseek-R1, GPT-4o, OpenAI o1, and o3-mini.