Table of Contents
Fetching ...

Explaining Legal Concepts with Augmented Large Language Models (GPT-4)

Jaromir Savelka, Kevin D. Ashley, Morgan A. Gray, Hannes Westermann, Huihui Xu

TL;DR

The paper tackles the challenge of interpreting open-textured legal terms by evaluating GPT-4's ability to explain statutory concepts and by comparing a baseline approach with an augmented method that injects case-law context via a legal information retrieval pipeline.The authors build and compare two systems: a Baseline model that explains directly from the statute and an Augmented model that conditions GPT-4 on retrieved high-value sentence excerpts from case law to ground explanations in precedent.Human experts evaluated explanations across factuality, clarity, relevance, information richness, and on-pointedness, finding that augmentation significantly improves quality and eliminates hallucinations, though IR quality remains a key factor.The results demonstrate a viable path for automated, context-grounded statutory explanations, with potential applications in legal education, practice, and accessible justice, and point to future work on IR robustness and broader legal domains.

Abstract

Interpreting the meaning of legal open-textured terms is a key task of legal professionals. An important source for this interpretation is how the term was applied in previous court cases. In this paper, we evaluate the performance of GPT-4 in generating factually accurate, clear and relevant explanations of terms in legislation. We compare the performance of a baseline setup, where GPT-4 is directly asked to explain a legal term, to an augmented approach, where a legal information retrieval module is used to provide relevant context to the model, in the form of sentences from case law. We found that the direct application of GPT-4 yields explanations that appear to be of very high quality on their surface. However, detailed analysis uncovered limitations in terms of the factual accuracy of the explanations. Further, we found that the augmentation leads to improved quality, and appears to eliminate the issue of hallucination, where models invent incorrect statements. These findings open the door to the building of systems that can autonomously retrieve relevant sentences from case law and condense them into a useful explanation for legal scholars, educators or practicing lawyers alike.

Explaining Legal Concepts with Augmented Large Language Models (GPT-4)

TL;DR

The paper tackles the challenge of interpreting open-textured legal terms by evaluating GPT-4's ability to explain statutory concepts and by comparing a baseline approach with an augmented method that injects case-law context via a legal information retrieval pipeline.The authors build and compare two systems: a Baseline model that explains directly from the statute and an Augmented model that conditions GPT-4 on retrieved high-value sentence excerpts from case law to ground explanations in precedent.Human experts evaluated explanations across factuality, clarity, relevance, information richness, and on-pointedness, finding that augmentation significantly improves quality and eliminates hallucinations, though IR quality remains a key factor.The results demonstrate a viable path for automated, context-grounded statutory explanations, with potential applications in legal education, practice, and accessible justice, and point to future work on IR robustness and broader legal domains.

Abstract

Interpreting the meaning of legal open-textured terms is a key task of legal professionals. An important source for this interpretation is how the term was applied in previous court cases. In this paper, we evaluate the performance of GPT-4 in generating factually accurate, clear and relevant explanations of terms in legislation. We compare the performance of a baseline setup, where GPT-4 is directly asked to explain a legal term, to an augmented approach, where a legal information retrieval module is used to provide relevant context to the model, in the form of sentences from case law. We found that the direct application of GPT-4 yields explanations that appear to be of very high quality on their surface. However, detailed analysis uncovered limitations in terms of the factual accuracy of the explanations. Further, we found that the augmentation leads to improved quality, and appears to eliminate the issue of hallucination, where models invent incorrect statements. These findings open the door to the building of systems that can autonomously retrieve relevant sentences from case law and condense them into a useful explanation for legal scholars, educators or practicing lawyers alike.
Paper Structure (18 sections, 4 figures, 2 tables)

This paper contains 18 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: GPT Prompts. The system prompt (top) provides high-level context. The task instructions are provided via the user prompt (bottom). Blue and red tokens with curly braces are replaced with the actual data. Orange and red passages are used only in the augmented version of the system.
  • Figure 2: System Architectures Diagrams. The top part shows the baseline directly applying the LLM. The bottom part describes the components of the augmented architecture that relying on the information retrieval component.
  • Figure 3: Short Explanation Preferences. Red corresponds to the preferences for the explanations generated by the baseline system while green indicates preferences for the explanations coming from the augmented LLM. The yellow/orange informs about the number of instances where no preference was indicated.
  • Figure 4: Long Explanation Preferences. Red corresponds to the preferences for the explanations generated by the baseline system while green indicates preferences for the explanations coming from the augmented LLM. The yellow/orange informs about the number of instances where no preference was indicated.