Table of Contents
Fetching ...

Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts

Emily Johnson, Xavier Holt, Noah Wilson

TL;DR

This paper tackles legal document tagging, a multi-label classification problem plagued by language complexity, label dependencies, and imbalance. It introduces Legal-LLM, an instruction-based fine-tuning approach that reframes tagging as a generation task, enabling the model to output the exact set of relevant labels conditioned on a prompt. Empirical results on POSTURE50K and EURLEX57K show that Legal-LLM achieves superior $micro ext{-}F1$ and $macro ext{-}F1$ scores compared to strong baselines, with ablations confirming the benefit of a weighted loss for imbalance and human evaluation corroborating label relevance. The work demonstrates the promise of instruction-following LLMs for domain-specific legal NLP and suggests avenues for long-document processing and enhanced imbalance handling. $micro ext{-}F1$ and $macro ext{-}F1$ scores are used to quantify performance, with the model excelling particularly on less frequent labels and under varied prompt and input-length conditions.

Abstract

Legal multi-label classification is a critical task for organizing and accessing the vast amount of legal documentation. Despite its importance, it faces challenges such as the complexity of legal language, intricate label dependencies, and significant label imbalance. In this paper, we propose Legal-LLM, a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning. We reframe the multi-label classification task as a structured generation problem, instructing the LLM to directly output the relevant legal categories for a given document. We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores. Our experimental results demonstrate that Legal-LLM outperforms a range of strong baseline models, including traditional methods and other Transformer-based approaches. Furthermore, ablation studies and human evaluations validate the effectiveness of our approach, particularly in handling label imbalance and generating relevant and accurate legal labels.

Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts

TL;DR

This paper tackles legal document tagging, a multi-label classification problem plagued by language complexity, label dependencies, and imbalance. It introduces Legal-LLM, an instruction-based fine-tuning approach that reframes tagging as a generation task, enabling the model to output the exact set of relevant labels conditioned on a prompt. Empirical results on POSTURE50K and EURLEX57K show that Legal-LLM achieves superior and scores compared to strong baselines, with ablations confirming the benefit of a weighted loss for imbalance and human evaluation corroborating label relevance. The work demonstrates the promise of instruction-following LLMs for domain-specific legal NLP and suggests avenues for long-document processing and enhanced imbalance handling. and scores are used to quantify performance, with the model excelling particularly on less frequent labels and under varied prompt and input-length conditions.

Abstract

Legal multi-label classification is a critical task for organizing and accessing the vast amount of legal documentation. Despite its importance, it faces challenges such as the complexity of legal language, intricate label dependencies, and significant label imbalance. In this paper, we propose Legal-LLM, a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning. We reframe the multi-label classification task as a structured generation problem, instructing the LLM to directly output the relevant legal categories for a given document. We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores. Our experimental results demonstrate that Legal-LLM outperforms a range of strong baseline models, including traditional methods and other Transformer-based approaches. Furthermore, ablation studies and human evaluations validate the effectiveness of our approach, particularly in handling label imbalance and generating relevant and accurate legal labels.

Paper Structure

This paper contains 14 sections, 6 tables.