LawLLM: Law Large Language Model for the US Legal System

Dong Shu; Haoran Zhao; Xukun Liu; David Demeter; Mengnan Du; Yongfeng Zhang

LawLLM: Law Large Language Model for the US Legal System

Dong Shu, Haoran Zhao, Xukun Liu, David Demeter, Mengnan Du, Yongfeng Zhang

TL;DR

LawLLM presents a multi-task large language model tailored for the US legal domain to perform Similar Case Retrieval, Precedent Case Recommendation, and Legal Judgment Prediction. It differentiates between precedent and similar cases, leveraging task-specific data preprocessing, a knowledge-graph approach for PCR, a vector-search strategy for SCR, and instruction-tuned prompts for LJP, all under a unified fine-tuning regime with 4-bit LoRA. Empirical results on the CaseLaw dataset show LawLLM achieving superior zero-shot and few-shot performance across SCR, PCR, and LJP compared with strong baselines including GPT-4, while maintaining a low not-found rate and robust reasoning. The work provides a concrete, end-to-end blueprint for multi-task legal analytics, offering practical potential for legal research, case preparation, and decision-support systems, and it outlines clear directions for expanding the framework to additional tasks and datasets.

Abstract

In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain.

LawLLM: Law Large Language Model for the US Legal System

TL;DR

Abstract

LawLLM: Law Large Language Model for the US Legal System

Authors

TL;DR

Abstract

Table of Contents

Figures (2)