Table of Contents
Fetching ...

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

M Manvith Prabhu, Haricharana Srinivasa, Anand Kumar M

TL;DR

This work addresses unsupervised and supervised approaches for answering legal questions in civil procedure within SemEval-2024 Task 5. It combines a similarity- and distance-based unsupervised labeling strategy with multi-level fusion of Legal-Bert embeddings and a CNN/GU/LSTM architecture, augmented by T5-based segment-wise summarization to handle lengthy explanations. The approach yields notable gains, with a 20-point improvement in macro $F1$ on the development set and a 10-point gain on the test set, indicating promise for simple yet effective architectures in complex legal NLP tasks. Looking ahead, the authors propose ensemble and Siamese-network strategies, along with alternative summarizers and data augmentation, to further boost performance and generalization in legal QA systems.

Abstract

This paper summarizes Team SCaLAR's work on SemEval-2024 Task 5: Legal Argument Reasoning in Civil Procedure. To address this Binary Classification task, which was daunting due to the complexity of the Legal Texts involved, we propose a simple yet novel similarity and distance-based unsupervised approach to generate labels. Further, we explore the Multi-level fusion of Legal-Bert embeddings using ensemble features, including CNN, GRU, and LSTM. To address the lengthy nature of Legal explanation in the dataset, we introduce T5-based segment-wise summarization, which successfully retained crucial information, enhancing the model's performance. Our unsupervised system witnessed a 20-point increase in macro F1-score on the development set and a 10-point increase on the test set, which is promising given its uncomplicated architecture.

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

TL;DR

This work addresses unsupervised and supervised approaches for answering legal questions in civil procedure within SemEval-2024 Task 5. It combines a similarity- and distance-based unsupervised labeling strategy with multi-level fusion of Legal-Bert embeddings and a CNN/GU/LSTM architecture, augmented by T5-based segment-wise summarization to handle lengthy explanations. The approach yields notable gains, with a 20-point improvement in macro on the development set and a 10-point gain on the test set, indicating promise for simple yet effective architectures in complex legal NLP tasks. Looking ahead, the authors propose ensemble and Siamese-network strategies, along with alternative summarizers and data augmentation, to further boost performance and generalization in legal QA systems.

Abstract

This paper summarizes Team SCaLAR's work on SemEval-2024 Task 5: Legal Argument Reasoning in Civil Procedure. To address this Binary Classification task, which was daunting due to the complexity of the Legal Texts involved, we propose a simple yet novel similarity and distance-based unsupervised approach to generate labels. Further, we explore the Multi-level fusion of Legal-Bert embeddings using ensemble features, including CNN, GRU, and LSTM. To address the lengthy nature of Legal explanation in the dataset, we introduce T5-based segment-wise summarization, which successfully retained crucial information, enhancing the model's performance. Our unsupervised system witnessed a 20-point increase in macro F1-score on the development set and a 10-point increase on the test set, which is promising given its uncomplicated architecture.
Paper Structure (16 sections, 1 equation, 2 figures, 4 tables, 1 algorithm)