LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning
Neel Guha, Daniel E. Ho, Julian Nyarko, Christopher Ré
TL;DR
Addresses whether foundation models can perform legal reasoning and how to measure it. Proposes LegalBench, an IRAC-informed, open benchmark with 44 seed tasks across four legal areas and a community-driven task submission model. Provides initial results across several models, highlighting how task type and prompting impact performance. Argues that this framework enables systematic evaluation, tracking progress, and guiding responsible adoption of FM tools in legal settings.
Abstract
Can foundation models be guided to execute tasks involving legal reasoning? We believe that building a benchmark to answer this question will require sustained collaborative efforts between the computer science and legal communities. To that end, this short paper serves three purposes. First, we describe how IRAC-a framework legal scholars use to distinguish different types of legal reasoning-can guide the construction of a Foundation Model oriented benchmark. Second, we present a seed set of 44 tasks built according to this framework. We discuss initial findings, and highlight directions for new tasks. Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks. This work is ongoing, and our progress can be tracked here: https://github.com/HazyResearch/legalbench.
