Table of Contents
Fetching ...

LawLLM: Law Large Language Model for the US Legal System

Dong Shu, Haoran Zhao, Xukun Liu, David Demeter, Mengnan Du, Yongfeng Zhang

TL;DR

LawLLM presents a multi-task large language model tailored for the US legal domain to perform Similar Case Retrieval, Precedent Case Recommendation, and Legal Judgment Prediction. It differentiates between precedent and similar cases, leveraging task-specific data preprocessing, a knowledge-graph approach for PCR, a vector-search strategy for SCR, and instruction-tuned prompts for LJP, all under a unified fine-tuning regime with 4-bit LoRA. Empirical results on the CaseLaw dataset show LawLLM achieving superior zero-shot and few-shot performance across SCR, PCR, and LJP compared with strong baselines including GPT-4, while maintaining a low not-found rate and robust reasoning. The work provides a concrete, end-to-end blueprint for multi-task legal analytics, offering practical potential for legal research, case preparation, and decision-support systems, and it outlines clear directions for expanding the framework to additional tasks and datasets.

Abstract

In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain.

LawLLM: Law Large Language Model for the US Legal System

TL;DR

LawLLM presents a multi-task large language model tailored for the US legal domain to perform Similar Case Retrieval, Precedent Case Recommendation, and Legal Judgment Prediction. It differentiates between precedent and similar cases, leveraging task-specific data preprocessing, a knowledge-graph approach for PCR, a vector-search strategy for SCR, and instruction-tuned prompts for LJP, all under a unified fine-tuning regime with 4-bit LoRA. Empirical results on the CaseLaw dataset show LawLLM achieving superior zero-shot and few-shot performance across SCR, PCR, and LJP compared with strong baselines including GPT-4, while maintaining a low not-found rate and robust reasoning. The work provides a concrete, end-to-end blueprint for multi-task legal analytics, offering practical potential for legal research, case preparation, and decision-support systems, and it outlines clear directions for expanding the framework to additional tasks and datasets.

Abstract

In the rapidly evolving field of legal analytics, finding relevant cases and accurately predicting judicial outcomes are challenging because of the complexity of legal language, which often includes specialized terminology, complex syntax, and historical context. Moreover, the subtle distinctions between similar and precedent cases require a deep understanding of legal knowledge. Researchers often conflate these concepts, making it difficult to develop specialized techniques to effectively address these nuanced tasks. In this paper, we introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain to address these challenges. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP). By clearly distinguishing between precedent and similar cases, we provide essential clarity, guiding future research in developing specialized strategies for these tasks. We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format. Furthermore, we also use techniques such as in-context learning (ICL) and advanced information retrieval methods in LawLLM. The evaluation results demonstrate that LawLLM consistently outperforms existing baselines in both zero-shot and few-shot scenarios, offering unparalleled multi-task capabilities and filling critical gaps in the legal domain.
Paper Structure (21 sections, 7 equations, 2 figures, 18 tables)

This paper contains 21 sections, 7 equations, 2 figures, 18 tables.

Figures (2)

  • Figure 1: LawLLM supports three tasks: Similar Case Retrieval, Precedent Case Recommendation, and Legal Judgment Prediction.
  • Figure 2: An overview of our LawLLM: Data Preprocessing is in the upper left in green, Similar Case Retrieval Processing is in the upper right in yellow, Precedent Case Recommendation is in the lower left in red, and Legal Judgment Prediction is in the lower right in blue.