Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

M Manvith Prabhu; Haricharana Srinivasa; Anand Kumar M

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

M Manvith Prabhu, Haricharana Srinivasa, Anand Kumar M

TL;DR

This work addresses unsupervised and supervised approaches for answering legal questions in civil procedure within SemEval-2024 Task 5. It combines a similarity- and distance-based unsupervised labeling strategy with multi-level fusion of Legal-Bert embeddings and a CNN/GU/LSTM architecture, augmented by T5-based segment-wise summarization to handle lengthy explanations. The approach yields notable gains, with a 20-point improvement in macro $F1$ on the development set and a 10-point gain on the test set, indicating promise for simple yet effective architectures in complex legal NLP tasks. Looking ahead, the authors propose ensemble and Siamese-network strategies, along with alternative summarizers and data augmentation, to further boost performance and generalization in legal QA systems.

Abstract

This paper summarizes Team SCaLAR's work on SemEval-2024 Task 5: Legal Argument Reasoning in Civil Procedure. To address this Binary Classification task, which was daunting due to the complexity of the Legal Texts involved, we propose a simple yet novel similarity and distance-based unsupervised approach to generate labels. Further, we explore the Multi-level fusion of Legal-Bert embeddings using ensemble features, including CNN, GRU, and LSTM. To address the lengthy nature of Legal explanation in the dataset, we introduce T5-based segment-wise summarization, which successfully retained crucial information, enhancing the model's performance. Our unsupervised system witnessed a 20-point increase in macro F1-score on the development set and a 10-point increase on the test set, which is promising given its uncomplicated architecture.

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

TL;DR

on the development set and a 10-point gain on the test set, indicating promise for simple yet effective architectures in complex legal NLP tasks. Looking ahead, the authors propose ensemble and Siamese-network strategies, along with alternative summarizers and data augmentation, to further boost performance and generalization in legal QA systems.

Abstract

Paper Structure (16 sections, 1 equation, 2 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 1 equation, 2 figures, 4 tables, 1 algorithm.

Introduction
Background
Related Works
System Overview
Supervised Models
Multi-Level Approach
Multi-Feature Approach
Unsupervised Models
Word2Vec-Cosine system
GloVE-Cosine system
Transformer embeddings-Cosine system and Transformer embeddings-Euclidean system
Experimental Setup
Supervised Models
Unsupervised Models
Results
...and 1 more sections

Figures (2)

Figure 1: Segment wise summary
Figure 2: Multi Level fusion

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

TL;DR

Abstract

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text

Authors

TL;DR

Abstract

Table of Contents

Figures (2)