Table of Contents
Fetching ...

Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice

Jonathan Li, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu

TL;DR

This work introduces and releases a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer, and proposes future directions for open-sourced efforts, which fall behind closed-sourced models.

Abstract

Generative AI models, such as the GPT and Llama series, have significant potential to assist laypeople in answering legal questions. However, little prior work focuses on the data sourcing, inference, and evaluation of these models in the context of laypersons. To this end, we propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation. We introduce and release a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer. We develop an automatic evaluation protocol for this dataset, then show that retrieval-augmented generation from only 850 citations in the train set can match or outperform internet-wide retrieval, despite containing 9 orders of magnitude less data. Finally, we propose future directions for open-sourced efforts, which fall behind closed-sourced models.

Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice

TL;DR

This work introduces and releases a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer, and proposes future directions for open-sourced efforts, which fall behind closed-sourced models.

Abstract

Generative AI models, such as the GPT and Llama series, have significant potential to assist laypeople in answering legal questions. However, little prior work focuses on the data sourcing, inference, and evaluation of these models in the context of laypersons. To this end, we propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation. We introduce and release a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer. We develop an automatic evaluation protocol for this dataset, then show that retrieval-augmented generation from only 850 citations in the train set can match or outperform internet-wide retrieval, despite containing 9 orders of magnitude less data. Finally, we propose future directions for open-sourced efforts, which fall behind closed-sourced models.
Paper Structure (20 sections, 6 figures, 2 tables)

This paper contains 20 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An overview of our framework for human-centric legal AI.
  • Figure 2: Distribution of question lengths and response lengths. Responses are concise and specific.
  • Figure 3: Retrieval-based methods used for our experiments. Given a legal question, retrieval is performed to generate a relevant answer.
  • Figure 4: Factual disagreement of each model by category. Lower is better.
  • Figure 5: Factual disagreement for each model. "GPT-3.5 Legal" is retrieval using only legal documents, and "GPT-3.5 Internet" is retrieval from the entire internet.
  • ...and 1 more figures