Table of Contents
Fetching ...

Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries

Damian Curran, Vanessa Sporne, Lea Frermann, Jeannie Paterson

TL;DR

This study asks whether LLMs’ knowledge of law varies by geographic location and how that affects the quality of legal information provided to users. It introduces a functionalist comparative-law framework that treats practical legal problems as the basis for cross-place evaluation, using 100 Reddit-derived, place-agnostic scenarios evaluated in Los Angeles, London, and Sydney across three LLMs. Manual annotation of LLM outputs against actual laws yields metrics for hallucinations, $hr$ and $hr^*$, revealing significant place-based differences and a strong negative correlation between majority-sample frequency and hallucination rates, suggesting a practical uncertainty signal. The work highlights implications for equitable access to justice and underscores the need for jurisdiction-aware validation of AI-assisted legal tools.

Abstract

How do we make a meaningful comparison of a large language model's knowledge of the law in one place compared to another? Quantifying these differences is critical to understanding if the quality of the legal information obtained by users of LLM-based chatbots varies depending on their location. However, obtaining meaningful comparative metrics is challenging because legal institutions in different places are not themselves easily comparable. In this work we propose a methodology to obtain place-to-place metrics based on the comparative law concept of functionalism. We construct a dataset of factual scenarios drawn from Reddit posts by users seeking legal advice for family, housing, employment, crime and traffic issues. We use these to elicit a summary of a law from the LLM relevant to each scenario in Los Angeles, London and Sydney. These summaries, typically of a legislative provision, are manually evaluated for hallucinations. We show that the rate of hallucination of legal information by leading closed-source LLMs is significantly associated with place. This suggests that the quality of legal solutions provided by these models is not evenly distributed across geography. Additionally, we show a strong negative correlation between hallucination rate and the frequency of the majority response when the LLM is sampled multiple times, suggesting a measure of uncertainty of model predictions of legal facts.

Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries

TL;DR

This study asks whether LLMs’ knowledge of law varies by geographic location and how that affects the quality of legal information provided to users. It introduces a functionalist comparative-law framework that treats practical legal problems as the basis for cross-place evaluation, using 100 Reddit-derived, place-agnostic scenarios evaluated in Los Angeles, London, and Sydney across three LLMs. Manual annotation of LLM outputs against actual laws yields metrics for hallucinations, and , revealing significant place-based differences and a strong negative correlation between majority-sample frequency and hallucination rates, suggesting a practical uncertainty signal. The work highlights implications for equitable access to justice and underscores the need for jurisdiction-aware validation of AI-assisted legal tools.

Abstract

How do we make a meaningful comparison of a large language model's knowledge of the law in one place compared to another? Quantifying these differences is critical to understanding if the quality of the legal information obtained by users of LLM-based chatbots varies depending on their location. However, obtaining meaningful comparative metrics is challenging because legal institutions in different places are not themselves easily comparable. In this work we propose a methodology to obtain place-to-place metrics based on the comparative law concept of functionalism. We construct a dataset of factual scenarios drawn from Reddit posts by users seeking legal advice for family, housing, employment, crime and traffic issues. We use these to elicit a summary of a law from the LLM relevant to each scenario in Los Angeles, London and Sydney. These summaries, typically of a legislative provision, are manually evaluated for hallucinations. We show that the rate of hallucination of legal information by leading closed-source LLMs is significantly associated with place. This suggests that the quality of legal solutions provided by these models is not evenly distributed across geography. Additionally, we show a strong negative correlation between hallucination rate and the frequency of the majority response when the LLM is sampled multiple times, suggesting a measure of uncertainty of model predictions of legal facts.

Paper Structure

This paper contains 17 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Overall workflow: A subject matter-diverse set of Reddit posts were converted to concise, place-agnostic scenerios. The prompt template has slots for the 'place' and the converted 'scenario'. The source text of the pinpoint reference identified by the LLM is found online, and the LLM's summary is manually annotated for hallucinations by a human annotator.
  • Figure 2: Annotations: Examples of our $\mathbf{H_{min}}$ and $\mathbf{H_{maj}}$ annotations. An example A annotation can be found in Figure \ref{['fig:workflow']}
  • Figure 3: Association of Hallucination Rates to Place: Figure(a): Hallucination rates by model. Figure(b): Rates by model, plotted. Figure(c): Chi-Square statistic for each association to place. Significance of results marked ** ($p < 0.001$), * ($p < 0.05$), or $\text{\textasciicircum}$ ($p < 0.1$).
  • Figure 4: Correlation of Sample Frequency and hr: Figure (a): hr and majority sample frequency, by model. Figure (b): Spearman's rank correlation coefficients between majority sample frequency and hr and hr*. ** ($p < 0.001$), * ($p < 0.05$), or $\text{\textasciicircum}$ ($p < 0.1$).
  • Figure 5: Nature of LLM References: Figure(a): Most frequently cited laws in each place across all models, number of instances and majority legal issue. Figure(b): Hallucination rates and counts by legal issue.