eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

Hoorieh Sabzevari; Mohammadmostafa Rostamkhani; Sauleh Eetemadi

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

Hoorieh Sabzevari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

TL;DR

The paper investigates the feasibility of automated legal argument reasoning on long-form civil procedure analyses by evaluating long-input architectures, domain-specific legal LMs, and zero-shot LLMs with prompts on the Glannon Guide Civil Procedure dataset. It demonstrates that zero-shot prompting, particularly with Copilot, achieves the best macro F1 of $0.64$, while other architectures struggle with long, complex legal data. The work highlights the trade-offs between prompt design, model capabilities, and input length, and discusses practical avenues like data summarization and multi-model collaboration for future improvements. These findings inform the deployment of AI systems for reasoning in legal contexts and guide directions for more robust, domain-aware NLP in law.

Abstract

This study investigates the performance of the zero-shot method in classifying data using three large language models, alongside two models with large input token sizes and the two pre-trained models on legal data. Our main dataset comes from the domain of U.S. civil procedure. It includes summaries of legal cases, specific questions, potential answers, and detailed explanations for why each solution is relevant, all sourced from a book aimed at law students. By comparing different methods, we aimed to understand how effectively they handle the complexities found in legal datasets. Our findings show how well the zero-shot method of large language models can understand complicated data. We achieved our highest F1 score of 64% in these experiments.

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

TL;DR

, while other architectures struggle with long, complex legal data. The work highlights the trade-offs between prompt design, model capabilities, and input length, and discusses practical avenues like data summarization and multi-model collaboration for future improvements. These findings inform the deployment of AI systems for reasoning in legal contexts and guide directions for more robust, domain-aware NLP in law.

Abstract

Paper Structure (14 sections, 2 equations, 1 figure, 3 tables)

This paper contains 14 sections, 2 equations, 1 figure, 3 tables.

Introduction
Background
Task Setup
Related Work
Pre-trained Legal Language Models
Domain-Specific LLMs in Law
System Overview
Preprocessing Data
Model
Pre-trained Models
Large Language Models
Experimental setup
Results
Conclusion

Figures (1)

Figure 1: Comparison between cross entropy and focal loss

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

TL;DR

Abstract

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

Authors

TL;DR

Abstract

Table of Contents

Figures (1)