Table of Contents
Fetching ...

LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents

Ananya Mantravadi, Shivali Dalmia, Olga Pospelova, Abhishek Mukherji, Nand Dave, Anudha Mittal

TL;DR

This work tackles the problem of unresolved contradictions in legal document generation and retrieval by introducing LegalWiz, a multi-agent framework that generates synthetic long-form legal documents with structured contradictions. It presents a six-type taxonomy of contradictions, automated contradiction mining with human-in-the-loop validation, and a retrieval-verifiability assessment to distinguish between retrievable and non-retrievable contradictions. Experimental results show that hybrid NLI+LLM detectors provide the best balance of precision and recall, especially for self-contradictions, while cross-document contradictions remain difficult and highlight the need for retrieval-aware, multi-hop reasoning. The framework offers a scalable, domain-relevant benchmark to stress-test and improve contradiction detection and resolution in legal RAG systems, enabling more reliable, interpretable, and trustworthy AI in high-stakes legal workflows.

Abstract

Retrieval-Augmented Generation (RAG) integrates large language models (LLMs) with external sources, but unresolved contradictions in retrieved evidence often lead to hallucinations and legally unsound outputs. Benchmarks currently used for contradiction detection lack domain realism, cover only limited conflict types, and rarely extend beyond single-sentence pairs, making them unsuitable for legal applications. Controlled generation of documents with embedded contradictions is therefore essential: it enables systematic stress-testing of models, ensures coverage of diverse conflict categories, and provides a reliable basis for evaluating contradiction detection and resolution. We present a multi-agent contradiction-aware benchmark framework for the legal domain that generates synthetic legal-style documents, injects six structured contradiction types, and models both self- and pairwise inconsistencies. Automated contradiction mining is combined with human-in-the-loop validation to guarantee plausibility and fidelity. This benchmark offers one of the first structured resources for contradiction-aware evaluation in legal RAG pipelines, supporting more consistent, interpretable, and trustworthy systems.

LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents

TL;DR

This work tackles the problem of unresolved contradictions in legal document generation and retrieval by introducing LegalWiz, a multi-agent framework that generates synthetic long-form legal documents with structured contradictions. It presents a six-type taxonomy of contradictions, automated contradiction mining with human-in-the-loop validation, and a retrieval-verifiability assessment to distinguish between retrievable and non-retrievable contradictions. Experimental results show that hybrid NLI+LLM detectors provide the best balance of precision and recall, especially for self-contradictions, while cross-document contradictions remain difficult and highlight the need for retrieval-aware, multi-hop reasoning. The framework offers a scalable, domain-relevant benchmark to stress-test and improve contradiction detection and resolution in legal RAG systems, enabling more reliable, interpretable, and trustworthy AI in high-stakes legal workflows.

Abstract

Retrieval-Augmented Generation (RAG) integrates large language models (LLMs) with external sources, but unresolved contradictions in retrieved evidence often lead to hallucinations and legally unsound outputs. Benchmarks currently used for contradiction detection lack domain realism, cover only limited conflict types, and rarely extend beyond single-sentence pairs, making them unsuitable for legal applications. Controlled generation of documents with embedded contradictions is therefore essential: it enables systematic stress-testing of models, ensures coverage of diverse conflict categories, and provides a reliable basis for evaluating contradiction detection and resolution. We present a multi-agent contradiction-aware benchmark framework for the legal domain that generates synthetic legal-style documents, injects six structured contradiction types, and models both self- and pairwise inconsistencies. Automated contradiction mining is combined with human-in-the-loop validation to guarantee plausibility and fidelity. This benchmark offers one of the first structured resources for contradiction-aware evaluation in legal RAG pipelines, supporting more consistent, interpretable, and trustworthy systems.

Paper Structure

This paper contains 20 sections, 5 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: LegalWiz - Workflow
  • Figure 2: Contradiction Generation