NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA
Anish Pahilajani, Samyak Rajesh Jain, Devasha Trivedi
TL;DR
The paper tackles automated legal reasoning for civil procedure in SemEval-2024 Task 5 by reframing legal answer validation as a QA problem and evaluating two pipelines: domain-aware fine-tuning of BERT-family models and few-shot prompting of GPT-3.5/4. The multi-choice QA reformulation with GPT-4, combined with a rule-based post-processing strategy, achieved the strongest test performance (F1 = 74.68, accuracy = 82.65), outperforming binary classification and BERT baselines. They observe that domain-specific pretraining benefits BERT-based models, while GPT-4's broader context handling and prompting capability better handle dense legal texts; however, issues such as input length limits and distributional shifts affect generalization. The work highlights the importance of data diversity, the potential of multi-choice prompting for legal QA, and future directions like incorporating explicit legal principles or precedents to improve reasoning.
Abstract
This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model. Our best submission is a BERT-based model that achieved the 7th place out of 20.
