Table of Contents
Fetching ...

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

Hoorieh Sabzevari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

TL;DR

The paper investigates the feasibility of automated legal argument reasoning on long-form civil procedure analyses by evaluating long-input architectures, domain-specific legal LMs, and zero-shot LLMs with prompts on the Glannon Guide Civil Procedure dataset. It demonstrates that zero-shot prompting, particularly with Copilot, achieves the best macro F1 of $0.64$, while other architectures struggle with long, complex legal data. The work highlights the trade-offs between prompt design, model capabilities, and input length, and discusses practical avenues like data summarization and multi-model collaboration for future improvements. These findings inform the deployment of AI systems for reasoning in legal contexts and guide directions for more robust, domain-aware NLP in law.

Abstract

This study investigates the performance of the zero-shot method in classifying data using three large language models, alongside two models with large input token sizes and the two pre-trained models on legal data. Our main dataset comes from the domain of U.S. civil procedure. It includes summaries of legal cases, specific questions, potential answers, and detailed explanations for why each solution is relevant, all sourced from a book aimed at law students. By comparing different methods, we aimed to understand how effectively they handle the complexities found in legal datasets. Our findings show how well the zero-shot method of large language models can understand complicated data. We achieved our highest F1 score of 64% in these experiments.

eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure

TL;DR

The paper investigates the feasibility of automated legal argument reasoning on long-form civil procedure analyses by evaluating long-input architectures, domain-specific legal LMs, and zero-shot LLMs with prompts on the Glannon Guide Civil Procedure dataset. It demonstrates that zero-shot prompting, particularly with Copilot, achieves the best macro F1 of , while other architectures struggle with long, complex legal data. The work highlights the trade-offs between prompt design, model capabilities, and input length, and discusses practical avenues like data summarization and multi-model collaboration for future improvements. These findings inform the deployment of AI systems for reasoning in legal contexts and guide directions for more robust, domain-aware NLP in law.

Abstract

This study investigates the performance of the zero-shot method in classifying data using three large language models, alongside two models with large input token sizes and the two pre-trained models on legal data. Our main dataset comes from the domain of U.S. civil procedure. It includes summaries of legal cases, specific questions, potential answers, and detailed explanations for why each solution is relevant, all sourced from a book aimed at law students. By comparing different methods, we aimed to understand how effectively they handle the complexities found in legal datasets. Our findings show how well the zero-shot method of large language models can understand complicated data. We achieved our highest F1 score of 64% in these experiments.
Paper Structure (14 sections, 2 equations, 1 figure, 3 tables)

This paper contains 14 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Comparison between cross entropy and focal loss