Evaluating AI for Law: Bridging the Gap with Open-Source Solutions
Rohan Bhambhoria, Samuel Dahan, Jonathan Li, Xiaodan Zhu
TL;DR
The paper addresses the risks of applying general-purpose AI to high-stakes legal tasks and argues for domain-specific, open-source solutions to improve accuracy, transparency, and access to justice. It introduces LegalQA, a high-quality legal QA dataset curated from lay questions and expert Canadian-law answers, along with Law Stack Exchange content, and proposes OpenJustice as a crowdsourced, open-source framework for building legal AI. Benchmarking with GPT-4 and Mixtral indicates that while GPT-4 attains low factual error, open-source models lag and exhibit issues such as missing citations and verbosity, underscoring the need for domain-focused methods. The authors propose a concrete OpenJustice architecture and a three-path framework (build, fine-tune, or train small models) supported by a data-centric development and evaluation pipeline to democratize robust, explainable legal AI that can enhance access to justice.
Abstract
This study evaluates the performance of general-purpose AI, like ChatGPT, in legal question-answering tasks, highlighting significant risks to legal professionals and clients. It suggests leveraging foundational models enhanced by domain-specific knowledge to overcome these issues. The paper advocates for creating open-source legal AI systems to improve accuracy, transparency, and narrative diversity, addressing general AI's shortcomings in legal contexts.
