HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
Valentin Noël, Elimane Yassine Seidou, Charly Ken Capo-Chichi, Ghanem Amari
TL;DR
HalluGraph tackles the accountability problem in legal RAG systems by moving beyond semantic similarity to graph-based fidelity checks. It builds knowledge graphs from context, queries, and responses, and measures fidelity through Entity Grounding and Relation Preservation, combined into a Composite Fidelity Index with full auditability. Empirically, it achieves high discrimination (AUC ≈ 0.979 on structured control tasks and ≈0.89 on legal tasks), outperforming baselines like BERTScore and NLI, while highlighting regime boundaries where the method is most effective. This approach provides auditable, explainable safeguards for high-stakes legal AI and offers a practical integration path for production RAG pipelines in regulated settings.
Abstract
Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioners need verifiable guarantees that generated text faithfully represents source documents. Existing hallucination detectors rely on semantic similarity metrics that tolerate entity substitutions, a dangerous failure mode when confusing parties, dates, or legal provisions can have material consequences. We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response. Our approach produces bounded, interpretable metrics decomposed into \textit{Entity Grounding} (EG), measuring whether entities in the response appear in source documents, and \textit{Relation Preservation} (RP), verifying that asserted relationships are supported by context. On structured control documents, HalluGraph achieves near-perfect discrimination ($>$400 words, $>$20 entities), HalluGraph achieves $AUC = 0.979$, while maintaining robust performance ($AUC \approx 0.89$) on challenging generative legal task, consistently outperforming semantic similarity baselines. The framework provides the transparency and traceability required for high-stakes legal applications, enabling full audit trails from generated assertions back to source passages.
