Explaining Legal Concepts with Augmented Large Language Models (GPT-4)
Jaromir Savelka, Kevin D. Ashley, Morgan A. Gray, Hannes Westermann, Huihui Xu
TL;DR
The paper tackles the challenge of interpreting open-textured legal terms by evaluating GPT-4's ability to explain statutory concepts and by comparing a baseline approach with an augmented method that injects case-law context via a legal information retrieval pipeline.The authors build and compare two systems: a Baseline model that explains directly from the statute and an Augmented model that conditions GPT-4 on retrieved high-value sentence excerpts from case law to ground explanations in precedent.Human experts evaluated explanations across factuality, clarity, relevance, information richness, and on-pointedness, finding that augmentation significantly improves quality and eliminates hallucinations, though IR quality remains a key factor.The results demonstrate a viable path for automated, context-grounded statutory explanations, with potential applications in legal education, practice, and accessible justice, and point to future work on IR robustness and broader legal domains.
Abstract
Interpreting the meaning of legal open-textured terms is a key task of legal professionals. An important source for this interpretation is how the term was applied in previous court cases. In this paper, we evaluate the performance of GPT-4 in generating factually accurate, clear and relevant explanations of terms in legislation. We compare the performance of a baseline setup, where GPT-4 is directly asked to explain a legal term, to an augmented approach, where a legal information retrieval module is used to provide relevant context to the model, in the form of sentences from case law. We found that the direct application of GPT-4 yields explanations that appear to be of very high quality on their surface. However, detailed analysis uncovered limitations in terms of the factual accuracy of the explanations. Further, we found that the augmentation leads to improved quality, and appears to eliminate the issue of hallucination, where models invent incorrect statements. These findings open the door to the building of systems that can autonomously retrieve relevant sentences from case law and condense them into a useful explanation for legal scholars, educators or practicing lawyers alike.
